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MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS 
HAVING INCREASED CELLULAR FLUORESCENCE 

FIELD OF THE INVENTION 

This invention generally relates to novel proteins 
and their production which are useful for detecting gene 
expression and for visualizing the subcellular targeting and 
distribution of selected proteins and peptides, among other 
things. The invention specifically relates to mutations in 
the gene coding for the jellyfish Aeguorea victoria green 
fluorescent protein { "GFP " ) , which mutations encode mutant GFP 
proteins having either an enhanced green or a blue 
fluorescence, and uses for them. 

BACKGROUND OF THE INVENTION 

Green fluorescent protein ("GFP") is a monomeric 
protein of about 27 kDa which can be isolated from the 
biolummescent jellyfish Aeqruorea victoria. . When wild type 
GFP is illuminated by blue or ultraviolet light, it emits a 
brilliant green fluorescence. Similar to fluorescein 
isothiocyanate , GFP absorbs ultraviolet and blue light with a 
maximum absorbance at 395 nm and a minor peak of absorbance at 
470 nm, and emits green light with a maximum emission at 509 
nm with a minor peak at 540 nm . GFP fluorescence persists 
even after fixation with formaldehyde, and it is more stable 
to photobleaching than fluorescein. 

The gene for GFP has been isolated and sequenced. 
Prasher, D . C. et al. (1992), "Primary structure of the 
Aequorea victoria green fluorescent protein," Gene 111:229- 
233. Expression vectors that comprise the GFP gene or cDNA 
have been introduced into a variety of host cells. These host 
cells include; Chinese hamster ovary ( CHO ) cells, human 
embryonic kidney ceils (HEK293), CCS - 1 monkey cells, myeloma 
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cells, NIK 3T3 mouse fibroblasts, PtKl cells, BHK cells, PC12 
ceils, Xenopus , leech, transgenic zebra fish, transgenic mice, 
Drosophila and several plants. The GFP molecules expressed by 
these different cells have a similar fluorescence as the 
native molecules, demonstrating that the GFP fluorescence does 
not require any species - specif ic cof actors or substrates. 
See, e.g., Baulcombe, D. et al . (1995), "Jellyfish green 
fluorescent protein as a reporter for virus infections," The 
Plant Journal 7:1045-1053; Chalfie, M . et al . (1994), "Green 
fluorescent protein as a marker for gene expression," Science 
263:802-805; Inouye , S. & Tsu]i f F. (1994), "Aequorea green 
fluorescent protein: expression of the gene and fluorescent 
characteristics of the recombinant protein," FEBS Letters 
341:277-280; Inouye , S. & Tsuji, F. (1994), "Evidence for 
redox forms of the Aequorea green fluorescent protein," FEBS 
Letters 351:211-214; Kain, S. et al. (1995), "The green 
fluorescent protein as a reporter of gene expression and 
protein localization, " BioTechniques (in press); Kitts, P. et 
al . (1995), "Green Fluorescent Protein (GFP) : A novel reporter 
for monitoring gene expression in living organisms, " 
CLONTECHmques X(l):l-3; Lo , D. et al. (1994), "Neuronal 
transfection in brain slices using particle-mediated gene 
transfer," Neuron 13:1263-1268; Moss, J. B. U Rosenthal, N. 
(1994), "Analysis of gene expression patterns m the embryonic 
mouse myotome with the green fluorescent protein, a new vital 
marker," J. Cell, Biochem. , Supplement 18D W161; Niedz , R. et 
al . (1995), "Green fluorescent protein: an in vivo reporter of 
plant gene expression," Plant Cell Reports 14:403-406; Wu , 
G.-I. et al . (1995), "Infection of frog neurons with vaccinia 
virus permits in vivo expression of foreign proteins, " Neuron 
14:681-684; Yu , J. & van den Engh, G . (1995), "Flow-sort and 
growth of single bacterial cells transformed with cosmid and 
plasmid vectors that include the gene for green- f luorescent 
protein as a visible marker, " Abstracts of papers presented at 
the 1995 meeting on "Genome Mapping and Sequencing," Cold 
Spring Harbor, p. 293. 

The active GFP chromophore is a hexapeptide which 
contains a cyclized Ser - dehydroTyr - g iy trimer at positions 65- 
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67. This chromophore is only fluorescent when embedded 
within the intact GFP protein. Chromophore formation occurs 
post - translationally ,• nascent GFP is not fluorescent. The 
chromophore is thought to be formed by a cyclization reaction 
and an oxidation step that requires molecular oxygen. 

Proteins can be fused to the amino (N-) or carboxy 
(C-) terminus of GFP. Such fused proteins have been shown to 
retain the fluorescent properties of GFP and the functional 
properties of the fusion partner. Bian, J. ec al . (1995), 
"Nuclear localization of HIV-l matrix protein P17: The use of 
A. victoria GFP in protein tagging and tracing," FASEB J. 
9:AI279; Flach, J. et al . (1994), "A yeast RNA-binding 
protein shuttles between the nucleus and the cytoplasm," Mol. 
Cell. Biol. 14:8399-8407; Marshall, J. et al . (1995), "The 
jellyfish green fluorescent protein: a new tool for studying 
ion channel expression and function," Neuron 14:211-215; 
Olmsted, J. et al . (1994), "Green Fluorescent Protein (GFP) 
chimeras as reporters for MAP4 behavior in living cells," Mol . 
Biol, of the Cell 5:167a ; Rizzuto, R. et al . (1995), "Chimeric 
20 green fluorescent protein as a tool for visualizing 

subcellular organelles in living cells," Current Biol. 
5:635-642; Sengupta, P. et al . (1994), "The C. elegans gene 
odr-7 encodes an ol factory - speci fic member of the nuclear 
receptor superf amily , " Cell 79:971-980; Stearns, T. (1995), 
25 "The green revolution," Current Biol. 5:262-264; Treimn, M. & 
Chalfie, M. (1995), "A mutated acetylcholine receptor subunit 
causes neuronal degeneration in C. elegans," Neuron 14:871- 
877; Wang, S. U Hazelrigg, T. (1994), "Implications for bed 
MRNA localization from spatial distribution of exu protein in 
30 Drosophila oogenesis," Nature 369:400-403. 

A number of GFP mutants have been reported. 
Delagrave, S. et al . (1995) "Red-shifted excitation mutants of 
the green fluorescent protein," Bio/Technology 13:151-154; 
Heim, R. et al . (1994) "Wavelength mutations and 
posttranslational autoxidation of green fluorescent protein," 
Proc. Natl. Acad. Sci . USA 91:12501-12504; Heim, R. et al . 
(1995), "Improved green fluorescence," Nature 373:663-664. 
Delgrave et al . (1995) Bio/Technology 13:151-154 isolated 
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mutants cf cloned Aequorea viccoria GFP that had red-shifted 
excitation spectra. Heim, R. et ai . (1994) "Wavelength 
mutations and postt ranslat lonal autoxidation of green 
fluorescent protein," Proc. Natl. Acad. Sci . USA 91:12501- 
5 12504 reported a mutant (Tyr6 6 to His) having a blue 

fluorescence, which is herein designated BFP ( Tyr 67 —His ) 
These references have neither taught nor suggested that their 
mutations resulted in an increase in the cellular fluorescence 
of the mutant GFPs . 

10 In general, the level of fluorescence of a protein 

expressed in a ceil depends on several factors, such as number 
of copies made of the fluorescent protein, stability of the 
protein, efficiency of formation of the chromophore , and 
interactions with cellular solvents, solutes and structures. 

15 Although the fluorescent signal from wild type GFP or from the 
reported mutants is generally adequate for bulk detection of 
abundantly expressed GFP or of GFP - containing chimeras, it is 
inadequate for detecting transient low or const i tut ively low 
levels of expression, or for performing fine structural 

20 subcellular localizations. This limitation severely restricts 
the use of nat ive GFP or of the reported mutants as a 
biochemical and structural marker for gene expression and 
morphological studies . 

2 5 SUMMARY OF THE INVENTION 

It an object of the invention to provide engineered 
GFP-encoding nucleic acid sequences that encode modified GFP 
molecules having a greater cellular fluorescence than wild 
3C type GFP or prior described recombinant GFP. 

It is a further object of this invention to provide 
recombinant vectors containing these modified GFP-encoding 
nucleic acid sequences, which vectors are capable of being 
inserted into a variety of cells (including mammalian and 
35 eukaryotic cells) and expressing the modified GFP. 

It is also an object of this invention to provide 
host cells capable of providing useful quantities of 
homogeneous modified GFP. 
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It is yet another object of this invention to 
provide peptides that possess a greater cellular fluorescence 
than native GFP or unaltered recombinant GFP and that can be 
produced in large quantities in a laboratory, by a 
5 microorganism or by a cell in culture. 

These and other objects of the invention have been 
accomplished by providing mutant GFP-encoding nucleic acids 
whose gene product exhibits an increased cellular fluorescence 
relative to naturally occurring or recombinantly produced wild 
10 type GFP ( " wtGFP " ) . in some embodiments, the modified GFPs 

possess fluorescent activity that is 50-100 fold greater than 
that of unmodified GFP. 

The modified proteins of the present invention are 
procured ry raKir.g mutations in a genetic sequence that 
15 resuk : .-. alterations in the amino acid sequence of the 

resulting r.one product. Our starting material was a GFP - 
encoding r.ucleic acid wherein a codon encoding an additional 
nucleic acid was inserted at position 2 of the previously 
published GFF ammo acid sequence (Chalfie et al . , 1994), to 
20 introduce u useful restriction site. Due to the amino acid 
insertion at position 2 of the GFP amino acid sequence, our 
numbering of the GFP amino acids and description of the amino 
acid amutations is off by one as compared to the originally 
reported wild - ype GFP sequence (Prasher et al . , 1992). Thus, 
25 ammo acid ~ e: by our numbering corresponds to amino acid 64 of 
the originally reported wild type GFP, amino acid 168 
corresponds to ammo acid 167 of the originally reported wild 
type GFP, etc. 

Using the modified wild type GFP described herein, a 
number of the unique mutants described herein derive from the 
discovery of an unplanned and unexpected mutation called 
"SG12", obtained in the course of site-directed mutagenesis 
experiments, wherein a phenylalanine at position 65 of wtGFP 
was converted to leucine. A mutant referred to as "SG11 , " 
which combined the phenylalanine 65 to leucine alteration with 
an isoleucme 16B to threonine substitution and a lysine 239 
to asparagme substitution, gave a further enhanced 
fluorescence mter.city. The lysine 23 9 to asparagme 
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substitution does not affect the fluorescence cf GFP; indeed 
the C- terminal lysine or asparagine may be deleted without 
affecting fluorescence. A third and further improved GFP 
mutant was obtained by further mutating "SGll. " This mutant 
is referred to as "SG25 M and , in addition to the SGll 
mutations, contains an additional mutation, a substitution of 
a cysteine at position 66 for the serine normally found at 
that position in the sequence. 

In addition, the invention encompasses novel GFP 
mutants that emit a blue fluorescence. These blue mutants are 
derived from a mutation of the wild type GFP (Heim, R. et al . 
(1994) "Wavelength mutations and pos t t ransla t ional 
autoxidation of green fluorescent protein," Proc . Natl. Acad. 
Sci . USA 91:12501-12504), in which histidine was substituted 
for tyrosine at amino acid position 66. This mutant emits a 
blue fluorescence, i.e., it becomes a Blue Fluorescent Protein 
(BFP) . 

Novel BFP mutants having an enhanced blue 
fluorescence were made by further modifying this 

BFP (Tyr 67 -*His) . The introduction of the same mutation used to 
generate SG12, (i.e., phenylalanine to leucine at position 65) 
into BFP {Tyr 67 -*His ) resulted in a new mutant having a brighter 
fluorescence, designated " SuperBlue - 4 2 " (SB42) . A second 
independently generated mutation of BFP { Tyr 6 7 -*His ) , in which a 
valine at position 164 was converted to alanine, also emitted 
an enhanced blue fluorescent signal and is referred to as 
"SB49." A combination of the above two mutations resulted in 
"SB50", which exhibited an even greater fluorescence 
enhancement than either of the previous mutations. 

The novel GFP and BFP mutants of this invention 
allow for a significantly more sensitive detection of 
fluorescence in host cells than is possible with the wild type 
protein. Accordingly, the mutant GFPs provided herein can be 
used, among other things, as sensitive reporter molecules to 
detect the cell and t issue - speci f ic expression and subcellular 
compartmental izat ion of GFP or of chimeric proteins comprising 
GFP fused to a regulatory sequence or to a second protein 
sequence. In addition, these mutations make possible a 
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variety of one and two color protein assays to quantitate 
expression in mammalian cells. 
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- DETAILED DESCRIPTION OF THE INVENTION 

The present invention comprises mutant nucleic acids 
that encode engineered GFPs having a greater cellular 
fluorescence than either native GFP or unaltered ("wild type") 
10 recombinant GFP, and the mutant GFPs themselves. It further 
comprises a subset of mutant GFPs that are mutant blue 
fluorescent proteins ("BFPs" ) that are derived from a 
published BFP , designated BF P (Tyr 67 ^His ) , wherein the mutant 
BFPs have a cellular fluorescence that is at least five times 
is greater, preferably ten times greater, and most preferably 20 
times greater than that of BFP ( Tyr 67 ^Hi s ) . The invention also 
encompasses compositions such as vectors and ceils that 
comprise either the mutant nucleic acids or the mutant protein 
gene products. The mutant GFP nucleic acids and proteins may 
be used to detect and quantify gene expression in living 
cells, and to detect and quantify tissue specific expression 
and subcellular distribution of GFP or of GFP fused to other 
proteins . 

25 1 • General Definitions 

Unless defined otherwise, all technical and 
scientific terms used herein have the same meaning as commonly 
understood by one of ordinary skill in the art to which this 
invention belongs. Singleton et ai . (1994) Dictionary of 

30 Microbiology and Molecular Biology, second edition, John Wiley 
and Sons (New York) provides one of skill with a general 
dictionary of many of the terms used in this invention. 
Although any methods and materials similar or equivalent to 
those described herein can be used in the practice or testing 

35 of the present invention, the preferred methods and materials 
are described. For purposes of the present invention, the 
following terms are defined below. 
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The symbols, abbreviations and definitions used 
herein are set forth below : 

DNA,- deoxyribonucleic acid 
RNA , ribonucleic acid 
5 mRNA, messenger RNA 

cDNA , complementary DNA {enzymat ically synthesized from an 

mRNA sequence) 
A - Aden me 
T - Thymine 
10 G-Guanme 
C- Cytosme 
U-Uracii 

GFP , Green Fluorescent Protein 
BFP, Blue Fluorescent Protein 

15 

Amino acids are sometimes referred to herein by the 
conventional one or three letter codes. 

Wild type green fluorescent protein ("wtGFP") refers 
to the 2j- jitiir,:- acid sequence described by Chalfie et al . , 
20 Science 2 € 3 . 601-^05, 1994, the nucleotide sequence of which 
is set out as L'EC ID NO : 1 , and the amino acid sequence of 
which is set cut as SEQ ID NO : 2 . This sequence differs from 
the original 236 amino acid GFP isolated from the 
biolummescent jellyfish Aequorea victoria in that one amino 
25 acid has been inserted after position 2 of the 238 amino acid 
sequence. When reference in this application is made to an 
amino acid position of GFP, the position is made with 
reference to that described by Chalfie et al., supra and thus 
of SEQ ID NO : 2 . 

30 The term "blue fluorescent protein" (BFP) refers to 

mutants of wtGFP wherein the tyrosine at position 67 is 
converted to a histidine, which mutants emit a blue 
fluorescence. The non - limiting prototype is herein designated 
BFP (Tyr 67 ->His) . 

A shorthand designation for mutations that result in 
a change in amine acid sequence is the one or three letter 
code for the original amino acid, the number of the position 
of the amino acid in the wtGFP sequence, followed by the one 
or three letter code for the new amino acid. Thus, Phe65Leu 
or F65L both designate a mutation wherein the phenylalanine at 
position 65 of the wtGFP is converted to leucine. 

Salts of any of the proteins described herein will 
naturally occur when such proteins are present in (or isolated 
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from) aqueous solutions of various pHs . All salts of peptides 
having the indicated biological activity are considered to be 
within the scope of the present invention. Examples include 
alkali, alkaline earth, and other metal salts of carboxylic 
5 acid residues, acid addition salts (e.g., HC1 ) of amino 

residues, and Zwitterions formed by reactions between 
carboxylic acid and amino acid residues within the same 
molecule . 

The terms " bioluminescent " and "fluorescent" refer 
10 to the ability of GFP or of a derivative thereof to emit light 
("emitted or fluorescent light") of a characteristic 
wavelength when excited by light which is generally of a 
characteristic and different wavelength than that used to 
generate the emission. 
15 T ^e term "cellular fluorescence" denotes the 

fluorescence of a GFP-derived protein of the present invention 
when expressed in a cell, especially a mammalian cell. 

The term "nucleic acid" refers to a 
deoxyribonucleot ide or ribonucleotide polymer in either 
20 single- or double - stranded form, and unless specifically 

limited, encompasses known analogues of natural nucleotides 
that hybridize to nucleic acids in a manner similar to 
naturally occurring nucleotides. Unless otherwise indicated, 
a particular nucleic acid sequence implicitly provides the 
25 complementary sequence thereof, as well as the sequence 

explicitly indicated. As used herein, the terms "nucleic 
acid" and "gene" are interchangeable, and they encompass the 
term " cDNA . " 

The phrase "a nucleic acid sequence encoding" refers 
30 to a nucleic acid which contains sequence information that, if 
translated, yields the primary amino acid sequence of a 
specific protein or peptide. This phrase specifically 
encompasses degenerate codons (i.e., different codons which 
encode a single amino acid) of the native sequence or 
35 sequences which may be introduced to conform with codon 
preference in a specific host cell. 

The phrase "nucleic acid construct" denotes a 
K nucleic acid that is composed of two or more nucleic acid 
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sequences that are derived from different sources and that are 
ligated together using methods known in the art. 

The term "regulatory sequence" denotes- -all the non- 
coding elements of a nucleic acid sequence required for the 
correct and efficient expression of the "coding region" (i.e., 
the region that actually encodes the amino acid sequence of a 
peptide or protein), e.g., binding cites for polymerases and 
transcription factors , transcription and t ranslation 
initiation and termination sequences, TATA box, a promoter to 
direct transcription, a ribosome binding site for 
t ransiat lonai initiation, pol yadeny lat ion sequences, enhancer 
elements . 

The term "isolated" refers to material which is 
substantially or essentially free from components which 
normally accompany it as found in its native state (for 
example, a band on a gel). The isolated nucleic acids and the 
isolated proteins of this invention do not contain materials 
normally associated with their in situ environment, in 
particular, nuclear, cytosolic or membrane associated proteins 
or nucleic acids other than those nucleic acids which are 
indicated. The term "homogeneous" refers to a peptide or DNA 
sequence where the primary molecular structure (i.e., the 
sequence of amino acids or nucleotides) of substantially all 
molecules present in the composition under consideration is 
identical. The term "substantially" used in the preceding 
sentences preferably means at least 80% by weight, more 
preferably at least 95% by weight, and most preferably at 
least 99% by weight. 

The nucleic acids of this invention, whether RNA, 
cDNA, genomic DNA , or a hybrid of the various combinations, 
are synthesized in vitro or are isolated from natural sources 
or recombinant clones. The nucleic acids claimed herein are 
present in transformed or transfected whole cells, in 
transformed or transfected cell lysates, or in a partially 
purified or substantially pure form. The nucleic acids of the 
present invention are obtained as homogeneous preparations. 
They may be prepared by standard techniques well known in the 
art, including selective precipitation with such substances as 
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ammonium sulfate, isopropyl alcohol, ethyl alcohol, and/or 
exclusion, ion exchange or affinity column chromatography, 
immunopurif ication methods, and others. 

The phrase "conservatively modified variants 
thereof," when used wxth reference to a protein, denotes 
conservative ammo acid substitutions in which both the 
original and the substituted amino acids have similar 
structure (e.g., the R group contains a carboxylic acid) and 
properties (e.g., the original and the substituted ammo acids 
are acidic, such as glutamic and aspartic acid), such that the 
substitutions do not essentially alter specified properties of 
the protein, such as fluorescence. Amino acid substitutions 
that are conservative are well known in the art. The phrase 
"conservatively modified variants thereof," when used to 
describe a reference nucleic acid, denotes nucleic acids 
having nucleotide substitutions that yield degenerate codons 
for a given ammo acid or that encode conservative amino acid 
substitutions, as compared to the reference nucleic acid. 

The term "recombinant" or "engineered" when used 
with reference to a nucleic acid or a protein generally 
denotes that the composition or primary sequence of said 
nucleic acid or protein has been altered from the naturally 
occurring sequence using experimental manipulations well known 
to those skilled in the art. It may also denote that a 
nucleic acid or protein has been isolated and cloned into a 
vector, or that the nucleic acid that has been introduced into 
or expressed in a cell or cellular environment other than the 
cell or cellular environment in which said nucleic acid or 
protein may be found in nature. The phrase "engineered 
Aequorea victoria fluorescent protein" specifically 
encompasses a protein obtained by introducing one or more 
sequence alterations into the coding region of a nucleic acid 
that encodes wild type Aequorea victoria GFP, wherein the gene 
product of the engineered nucleic acid is a fluorescent 
protein recognized by antisera to wild type Aequorea Victoria 
GFP . 

The term "recombinant" or "engineered" when used 
with reference to a cell indicates that, as a result cf 
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experimental manipulation, the cell replicates or expresses a 
nucleic acid or expresses a peptide or protein encoded by a 
nucleic acid, whose origin is exogenous to the cell. 
Recombinant cells can express nucleic acids that are not found 
within the native ( non - recombinant ) form of the cell. 
Recombinant cells can also express nucleic acids found in the 
native form of the cell wherein the nucleic acids are re- 
introduced into the cell by artificial means. 

The term "vector" denotes an engineered nucleic acid 
construct that contains sequence elements that mediate the 
replication of the vector sequence and/or the expression of 
coding sequences present on the vector. Examples of vectors 
include eukaryotic and prokaryotic plasmids, viruses (for 
example, the HIV virus), cosmids, phagemids , and the like. 
The term "operably linked" refers to functional linkage 
between a first nucleic acid (for example, an expression 
control sequence such as a promoter or an array of 
transcription factor binding sites) and a second nucleic acid 
sequence, wherein the expression control sequence directs 
transcription of the nucleic acid corresponding to the second 
sequence. One or more selected isolated nucleic acids may be 
operably linked to a vector by methods known in the art. 

"Transduction" or "transformation" denotes the 
process whereby exogenous extracellular DNA is introduced into 
a cell, such that the cell is capable of replicating and or 
expressing the exogenous DNA . Generally, a selected nucleic 
acid is first inserted into a vector and the vector is then 
introduced into the cell. For example, plasmid DNA that is 
introduced under appropriate environmental conditions may 
undergo replication in the transformed cell, and the 
replicated copies are distributed to progeny cells when cell 
division occurs. As a result, a new cell line is established, 
containing the plasmid and carrying the genetic determinants 
thereof. Transformation by a plasmid in this manner, where 
the plasmid genes are maintained in the ceil line by plasmid 
replication, occurs at high frequency when the transforming 
plasmid DNA is in closed loop form, and does not or rarely 
occurs if linear plasmid DNA is used. 



WO 97/42320 



PC77US97/07625 



All the patents and publications cited in this 
disclosure are indicative of the level of skill of those 
skilled in the art to which this invention pertains and are 
ail herein individually incorporated by reference for ail 
purposes . 

11 • The GFP Mutants and Their Expression 

A. The GFP mutants 

The isolated nucleic acids reported here are those 
that encode an engineered protein derived from Aequorea 
victoria green fluorescent protein ("GFP") having a 
fluorescence at maximum emission that is at least five times 
greater, preferably ten times greater, and most preferably 
twenty times greater than the fluorescence at maximum emission 
of wild type GFP. In one embodiment, a nucleic acid encodes 
for leucine at amino acid position 65. This amino acid 
position is important for the enhanced fluorescence. In 
another embodiment the engineered isolated GFP nucleic acid 
also encodes for threonine at amino acid position 168. In an 
additional embodiment, the engineered isolated GFP nucleic 
acid further encodes for cysteine at amino acid position 66. 

Also described here are GFP mutants that have 
enhanced blue fluorescent properties. These mutants have an 
isolated nucleic acid that encode an engineered Aequorea 
victoria blue fluorescent protein that encodes for histidme 
at amino acid position 67, leucine at amino acid position 65 
and has a cellular fluorescence that is at least five times 
greater, preferably 10 times greater, most preferably 20 times 
greater than that of BFP ( Tyr 67 -*His ) . An alternative isolated 
BFP nucleic acid is one that encodes for an engineered 
Aequorea victoria blue fluorescent protein wherein the 
engineered BFP has histidine at amino acid position 67 and 
alanine at amino acid position 164. A third engineered 
isolated BFP nucleic acid sequence is one that has histidine 
at ammo acid position 67, leucine at amino acid position 65 
and alanine at amino acid position 164 . 
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The nucleic acid and ammo acid sequences for the 
wild type GFP are set out in SEQ ID NO : 1 and SEQ ID NO : 2 . The 
sequence is well-known, well -described and readily available 
for manipulation and use. Vectors bearing the nucleic acid 
sequence are commercially readily available from, for example, 
Clontech Laboratories, Inc., Clontech Laboratories, Inc., Palo 
Alto, CA . Clontech provides a line of reporter vectors for 
GFP, including the cDNA construct described by Chalfie, et 
al., supra, a promoterless GFP vector for monitoring the 
expression of cloned promoters in mammalian cells, and a 
series of vectors for creating fusion proteins to either the 
amine or carboxy terminus of GFP. 

One of skill in the art will recognize many ways of 
generating alterations in a given nucleic acid sequence. Such 
well-known methods include si te - directed mutagenesis, PCR 
amplification using degenerate oligonucleotides, exposure of 
cells containing the nucleic acid to mutagenic agents or 
radiation, chemical synthesis of a desired oligonucleotide 
(e.g., in conjunction with ligation and/or cloning to generate 
large nucleic acids) and other well-known techniques. See, 
e.g., Berger and Kimmel, Guide to Molecular Cloning 
Techniques , Methods in Enzymology Volume 152 Academic Press, 
Inc., San Diego, CA (Berger); Sambrook et al . (1989) Molecular 
Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring 
Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook); 
and Current Protocols in Molecular Biology, F.M. Ausubel et 
a 1 . , eds . , Current Protocols, a joint venture between Greene 
Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 
Supplement) (Ausubel); Pirrung et al., U.S. Patent No. 
5,143,854; and Fodor et al., Science, 251, 767-77 (1991). 
Product information from manufacturers of biological reagents 
and experimental equipment also provide information useful in 
known biological methods. Such manufacturers include the 
SIGMA Chemical Company (Saint Louis, MO), R&D systems 
(Minneapolis, MN) , Pharmacia LKB Biotechnology (Piscataway, 
NJ) , CLONTECH Laboratories, Inc. (Palo Alto, CA) , Chem Genes 
Corp., Aldrich Chemical Company (Milwaukee, WI ) , Glen 
Research, Inc., GIBCO BRL Life Technologies, Inc. 
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(Gauhersberg, MD) , Fluka Chemica-Biochemika Analytika (Fluka 
Chemie AG, Bucns , Switzerland), and Applied Biosystems {Foster 
City, CA) , as well as many other commercial sources known to 
one of skill. Using these techniques, it is possible to 
substitute at will any nucleotide m a nucleic acid that 
encodes any GFP or BFP disclosed herein or any amino acid in a 
GFP or BFP described herein for a predetermined nucleotide or 
amino acid. For example, it is possible to generate at will 
modified GFPs and BFP { Tyr 67 -*His ) s that contain leucine at 
position 65 and one or two or three additional mutations at 
any other position of the wtGFP or BFP (Tyr 67 -»His) 

The sequence of the cloned genes and synthetic 
oligonucleotides can be verified using the chemical 
degradation method of A.M. Maxam et al . (1980), Methods in 
Enzymology 65:499-560. The sequence can be confirmed after 
the assembly of the oligonucleotide fragments into the 
double-stranded DNA sequence using the method of Maxam and 
Gilbert, supra, or the chain termination method for sequencing 
double-stranded templates of R.B. Wallace et aJ . (1981), Gene, 
16:21-26. DNA sequencing may also be performed by the 
PCR-assisted fluorescent terminator method (ReadyReaction 
DyeDeoxy Terminator Cycle Sequencing Kit, ABI , Columbia, MD) 
according to the manufacturer's instructions, using the ABI 
Model 373A DNA Sequencing System. Sequencing data is analyzed 
using the commercially available Sequencher program (Gene 
Codes, Gene Codes, Ann Arbor, MI) 

B - Expression of Mutant GFP 

Clearly, the nucleic acid sequences of the present 
invention are excellent reporter sequences since the expressed 
proteins can be readily detected by fluorescence as described 
below. The sequences can be used m conjunction with any 
application appreciated to date for GFP and further in 
applications where a greater degree of fluorescence is 
required. Expression of the sequences described herein 
whether expression is desired alone or in combination with 
other sequences of interest is described below. 
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Vectors to which selected foreign nucleic acids are 
operably linked may be used to introduce these selected 
nucleic acids into host ceils and mediate their replication 
and/or expression. Cloning vectors are useful for replicating 
the foreign nucleic acids and obtaining clones of specific 
foreign nucleic acid - containing vectors. Expression vectors 
mediate the expression of the foreign nucleic acid. Some 
vectors are both cloning and expression vectors. 

Once a nucleic acid is synthesized or isolated and 
inserted into a vector and cloned, one may express the nucleic 
acid in a variety of recombinant ly engineered cells known to 
those of skill in the art. As used herein, "expression" 
refers to transcription of nucleic acids, either without or 
preferably with subsequent translation. 

Expression of a mutant BFP or of wild type or mutant 
GFP can be enhanced by including multiple copies of the GFP - 
encoding nucleic acid in a transformed host, by selecting a 
vector known to reproduce in the host, thereby producing large 
quantities of protein from exogenous inserted DMA (such as 
pUC8, ptacl2, or pIN- I I I -ompAl , 2, or 3), or by any other 
known means of enhancing peptide expression. In all cases, 
wtGFP or mutant GFPs will be expressed when the DNA sequence 
is functionally inserted into a vector. "Functionally 
inserted" means that it is inserted in proper reading frame 
and orientation. Typically, a GFP gene will be inserted 
downstream from a promoter and will be followed by a stop 
codon, although production as a hybrid protein followed by 
cleavage may be used, if desired. 

Examples of cells which are suitable for the cloning 
and expression of the nucleic acids of the invention include 
bacteria, yeast, filamentous fungi, insect (especially 
employing baculoviral vectors) , and mammalian cells, in 
particular cells capable of being maintained in tissue 
culture . 

Host cells are competent or rendered competent for 
transformation by various means. There are several well-known 
methods cf introducing DNA into animal cells. These include: 
calcium phosphate precipitation, fusion of the recipient cells 
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with bacterial protoplasts containing the DNA , treatment of 
the recipient cells with liposomes containing the DNA , DEAE 
dextran, receptor-mediated endocytosis, electroporat ion and 
micro- inject ion of the DNA directly into the cells. 

It is expected that those of skill in the art are 
Knowledgeable in the numerous systems available for cloning 
and expression of nucleic acids. In brief summary, the 
expression of natural or synthetic nucleic acids is typically 
achieved by operably linking a nucleic acid of interest to a 
promoter (which is either constitutive or inducible) , and 
incorporating the construct into an expression vector. The 
vectors are suitable for replication and integration in 
prokaryotes. oukaryotes, or both. Typical cloning vectors 
contain r r ans cr i pt ion and translation terminators, 
transcription and translation initiation sequences, and 
promoters useful tor regulation of the expression of the 
particular nucleic acid. The vectors optionally comprise 
generic expression cassettes containing at least one 
independent terminator sequence, sequences permitting 
replication of the cassette in eukaryotes, or prokaryotes, or 
both, (e.g., shuttle vectors) and selection markers for both 
prokaryotic and eukaryotic systems. See, e.g., Sambrook and 
Ausbei (both supra) . 

1 . Expression in Prokaryotes 

Prokaryotic systems for cloning and/or expressing 
engineered GFP or BFP proteins are available using E. coli f 
Bacillus sp. and Salmonella (Palva, I. et aJ. (1983), Gene 
22:229-235; Mosfaach, K. et al . (1963), Nature 302:543-545. To 
obtain high level expression in a prokaryotic system of a 
cloned nucleic acid such as those encoding engineered GFPs or 
BFPs, it is essential to construct expression vectors which 
contain, at a minimum, a strong promoter to direct 
transcription, a ribosome binding site for t rans la t i onal 
initiation, a transcription/translation terminator, a 
bacterial replicon, a nucleic acid encoding antibiotic 
resistance to permit selection of bacteria that harbor 
recombinant piasmids, and unique restriction sites in 
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nonessential regions of the plasmid to allow insertion of 
foreign nucleic acids. The particular antibiotic resistance 
gene, chosen is not critical, any of the many resistance genes 
known in the art are suitable. Examples of regulatory regions 
5 suitable for this purpose in E. ooli are the promoter and 

operator region of the E. coli tryptophan biosynthetic pathway 
as described by Yanof sky, C. (1984), J. Bacteriol . , 
158:1018-1024, and the leftward promoter of phage lambda (P L ) 
as described by Herskowitz, I. and Hagen, D. (1980), Ann. Rev. 

10 Genet. , 14:399-445 (1980). 

The particular vector used to transport the genetic 
information into the cell is not particularly critical . Any 
of the conventional vectors used for replication, cloning 
and/cr expression in prokaryotic cells may be used. 

15 The foreign nucleic acid can be incorporated into a 

nonessential region of the host cell's chromosome. This is 
achieved by first inserting the nucleic acid into a vector 
such that it is flanked by regions of DNA homologous to the 
insertion site in the host chromosome. After introduction of 

20 the vector into a host cell, the foreign nucleic acid is 

incorporated into the chromosome by homologous recombination 
between the flanking sequences and chromosomal DNA. 

Detection of the expressed protein is achieved by 
methods known m the art as radioimmunoassays, or Western 

25 blotting techniques or immunoprecipi tat ion . Purification from 
E. ooli can be achieved following procedures described in U.S. 
Patent No. 4,511,503. 



2 . Expression in Eukaryotes 

30 Standard eukaryotic transfection methods are used to 

produce mammalian, yeast or insect cell lines which express 
large quantities of engineered GFP or BFP protein which are 
then purified using standard techniques. See, e.g., Colley et 
al . (1989), J. Biol. Chem . 2 64:17619-17622, and Guide to 

35 Protein Purification, in Vol. 182 of Methods in Enzymology 
(Deutscher ed . , 1990), D.A. Morrison (1977), J, Bact . t 
132:349-351, or by J.E. Cl ark - Curt iss and R. Curtiss (1983), 
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Methods in Enzymology 101:347-362, Eds. R. Wu et al . , 
Academic Press, New York. 

The particular eukaryotic expression -vector used to 
transport the genetic information into the cell is not 
particularly critical. Any of the conventional vectors used 
for expression in eukaryotic cells may be used. Expression 
vectors containing regulatory elements from eukaryotic viruses 
such as retroviruses are typically used. SV4 0 vectors include 
pSV77 and pMT2 . Vectors derived from bovine papilloma virus 
include pBV-lMTHA, and vectors derived from Epstein Barr virus 
include pHEBO , and p205 . Other exemplary vectors include 
pMSG, pAV009/A + , P MTOl0/A + , pMAMneo- 5 , baculovirus pDSVE , and 
any other vector allowing expression of proteins under the 
direction of the SV-40 early promoter, SV-40 later promoter, 
15 metallothionein promoter, murine mammary tumor virus promoter, 
Rous sarcoma virus promoter, polyhedrin promoter, or other 
promoters shown effective for expression in eukaryotic cells. 

The expression vector typically comprises a 
eukaryotic transcription unit or expression cassette that 
contains all the elements required for the expression of the 
engineered GFP or BFP DNA in eukaryotic cells. A typical 
expression cassette contains a promoter operably linked to the 
DNA sequence encoding a engineered GFP or BFP protein and 
signals required for efficient polyadenylation of the 
25 transcript. 

Eukaryotic promoters typically contain two types of 
recognition sequences, the TATA box and upstream promoter 
elements. The TATA box, located 25-30 base pairs upstream of 
the transcription initiation site, is thought to be involved 
in directing RNA polymerase to begin RNA synthesis. The other 
upstream promoter elements determine the rate at which 
transcription is initiated. 

Enhancer elements can stimulate transcription up to 
1,000 fold from linked homologous or heterologous promoters. 
Enhancers are active when placed downstream or upstream from 
the transcription initiation site. Many enhancer elements 
derived from viruses have a broad host range and are active in 
a variety of tissues. For example, the SV4 0 early gene 
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enhancer is suitable for many cell types . Other 
enhancer /promoter combinations that are suitable for the 
present invention include those derived from polyoma virus, 
human or murine cytomegalovirus, the long term repeat from 
various retroviruses such as murine leukemia virus, murine cr 
Rous sarcoma virus and HIV. See, Enhancers and Eukaryotic 
Expression , Cold Spring Harbor Press, Cold Spring Harbor, N.Y. 
1983, which is incorporated herein by reference. 

In the construction of the expression cassette, the 
promoter is preferably positioned about the same distance from 
the heterologous transcription start site as it is from the 
transcription start site in its natural setting. As is known 
in the art, however, some variation in this distance can be 
accommodates without loss of promoter function. 

I r. addition to a promoter sequence, the expression 
cassette chcu.J also contain a transcription termination 
region downstream of the structural gene to provide for 
efficient termination. The termination region may be obtained 
from the same gene as the promoter sequence or may be obtained 
from different genes. 

If tne mRNA encoded by the structural gene is to be 
efficiently translated, polyadenylat ion sequences are also 
commonly added to the vector construct. Two distinct sequence 
elements are required for accurate and efficient 
polyadenylat ion : GU or U rich sequences located downstream 
from the polyadenylat ion site and a highly conserved sequence 
of six nucleotides, AAUAAA, located 11-30 nucleotides 
upstream. Termination and polyadenylat ion signals that are 
suitable for the present invention include those derived from 
SV40, or a partial genomic copy of a gene already resident on 
the expression vector. 

In addition to the elements already described, the 
expression vector cf the present invention may typically 
contain other specialized elements intended to increase the 
level of expression of cloned nucleic acids or to facilitate 
the identification of cells that carry the transfected DNA . 
For instance, a number of animal viruses contain DNA sequences 
that promote the extra chromosomal replication of the viral 



WO 97/42320 



PCT/US97/07625 



genome :n permissive cell types. Plasmids bearing these viral 
replicons are replicated episomally as long as the appropriate 
factors are provided by genes either carried on -the plasmid or 
with the genome of the host cell. 

The DNA sequence encoding the engineered GFP or 3F? 
protein may typically be linked to a cleavable signal peptide 
sequence to promote secretion of the encoded protein by the 
transformed cell. Such signal peptides would include, among 
others, the signal peptides from tissue plasminogen activator, 
insulin, neuron growth factor, and juvenile hormone esterase 
of Heliothis virescens. Additional elements of the cassette 
may include enhancers and, if genomic DNA is used as the 
structural gene, mtrons with functional splice donor and 
acceptor sites . 

The vector may or may not comprise a eukaryocic 
replicon. If a eukaryotic replicon is present, then the 
vector is amplifiable in eukaryotic cells using the 
appropriate selectable marker. If the vector does not 
comprise a eukaryotic replicon, no episomal amplification is 
possible. Instead, the transfected DNA integrates into the 
genome of the transfected cell, where the promoter directs 
expression of the desired nucleic acid. 

The vectors usually comprise selectable markers 
which result in nucleic acid amplification such as the sodium, 
potassium ATPase, thymidine kinase, aminoglycoside 
phosphotransferase, hygromycin B phosphotransferase, 
xanthine -guanine phosphor ibosyl transferase, CAD (carbamyl 
phosphate synthetase, aspartate transcarbamylase , and 
dihydroorotase) , adenosine deaminase, dihydrof oiate reductase, 
and asparagme synthetase and ouabain selection. 
Alternatively, high yield expression systems not involving 
nucleic acid amplification are also suitable, such as using a 
bacculovirus vector in insect cells, with the engineered GFP 
or BFP encoding sequence under the direction of the poiyhedrm 
promoter or other strong baculovirus promoters. 

The expression vectors of the present invention will 
typically contain both prokaryotic sequences that facilitate 
the cloning cf the vector xn bacteria as well as one or more 
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eukaryotic transcription units that are expressed only in 
eukaryotic cells, such as mammalian cells. The prokaryotic 
sequences are preferably chosen such that they do not 
interfere with the replication of the DNA in eukaryotic ceils. 

Any of the well known procedures for introducing 
foreign nucleotide sequences into host ceils may be used. 
These include the use of calcium phosphate t ransf ect ion , 
polybrene, protoplast fusion, eiectroporat ion , liposomes, 
microinjection, plasma vectors, viral vectors and any of the 
other well known methods for introducing cloned genomic DNA , 
cDNA , synthetic DNA or other foreign nucleic acidic material 
into a host cell (see Sambrook et ai . , supra). It is only 
necessary that the particular genetic engineering procedure 
utilized be capable of successfully introducing at least one 
nucleic acid into the host cell which is capable of expressing 
the engineered GFP or BFP protein. 

3, Expression in insect cells 

The baculovirus expression vector utilizes the 
highly expressed and regulated Autographa californica nuclear 
polyhidrosis virus (AcMNPV) poiyhedrin promoter modified for 
the insertion of foreign nucleic acids. Synthesis of 
poiyhedrin protein results in the formation of occlusion 
bodies m the infected insect cell . The baculovirus vector 
utilizes many of the protein modification , processing, and 
transport systems that occur in higher eukaryotic cells. The 
recombinant eukaryotic proteins expressed using this vector 
have been found in many cases to be, ant igeni cal ly , 
immunogenically , and functionally similar to their natural 
counterparts . 

Briefly, a DNA sequence encoding an engineered GFP 
or BFP is inserted into a transfer plasmid vector in the 
proper orientation downstream from the poiyhedrin promoter, 
and flanked on both ends with baculovirus sequences. Cultured 
insect cells, commonly Spodoptera frugiperda cells, are 
transfected with a mixture of viral and plasmid DNAs . The 
virus that develop, some of which are recombinant virus that 
result from homologous recombination between the two DNAs , are 
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plated at 100-1000 plaques per plate. The plaques containing 
recombinant virus can be identified visually because of their 
ability to form occlusion bodies or by DNA hybridization. The 
recombinant virus is isolated by plague purification. The 
5 resulting recombinant virus, capable of expressing engineered 
GFP or 3FP , is se 1 f - propagat ing in that no helper virus is 
required for maintenance or replication. After infecting an 
insect culture with recombinant virus, one can expect to find 
recombinant protein within 48-72 hours. The infection is 
10 essentially lytic within 4-5 days. 

There are a variety of transfer vectors into which 
the engineered GFP or BFP nucleic acid can be inserted. For a 
summary of transfer vectors see Luckow, V.A. and Summers, M.D. 
(1988), Bio/Technology 6:47-55. Preferred is the transfer 
vector P AcUW21 described by Bishop, D.H.L. (1992} in Seminars 
in Virology 3:253-264. 
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4, Retroviral Vectors 

Retroviral vectors are particularly useful for 
modifying eukaryotic cells because of the high efficiency with 
which the retroviral vectors transduce target cells and 
integrate into the target cell genome. Additionally, the 
retroviruses harboring the retoviral vector are capable of 
infecting cells from a wide variety of tissues. 
25 Retroviral vectors are produced by genetically 

manipulating retroviruses. Retroviruses are RNA viruses 
because the viral genome is RNA . Upon infection, this genomic 
RNA is reverse transcribed into a DNA copy which is integrated 
into the chromosomal DNA of transduced cells with a high 
30 degree of stability and efficiency. The integrated DNA copy 
is referred to as a provirus and is inherited by daughter 
cells as is any other gene. The wild type retroviral genome 
and the proviral DNA have three genes: the gag, the pol and 
the env genes, which are flanked by two long terminal repeat 
3 5 (LTR) sequences. The gag gene encodes the internal structural 

(nucleocapsid) proteins; the pol gene encodes the RNA directed 
DNA polymerase (reverse transcriptase); and the env gene 
encodes viral envelope glycoproteins. The 5< and 3' LTRs 
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serve to promote transcription and polyadenylation of virion 
RNAs . Adjacent to the 5' LTR are sequences necessary for 
reverse transcription of the genome (the tRNA primer binding 
site) and for efficient encapsulation of viral RNA into 
5 particles (the Psi site). See Mulligan, R.C. (1983), In: 

Experimental Manipulation of Gene Expression, M. Inouye (ed), 
155-173; Mann, R. et al . (1983), Cell, 33:153-159; Cone, R . D . 
and R.C. Mulligan (1984), Proceedings of the National Academy 
of Sciences, U.S.A. S 1 : 6 34 9 - 6 3 53 . 
1 0 T ^e design of retroviral vectors is well known to 

one of skill in the art. See Singer, M . and Berg, P. supra. 
In brief, if the sequences necessary for encapsida t ion (or 
packaging of retroviral RNA into infectious virions) are 
missing from the viral genome, the result is a cis acting 
15 defect which prevents encapsidat ion of genomic RNA. However, 
the resulting mutant is still capable of directing the 
synthesis of all virion proteins. Retroviral genomes from 
which these sequences have been deleted, as well as cell lines 
containing the mutant genome stably integrated into the 
20 chromosome are well known in the art and are used to construct 
retroviral vectors. Preparation of retroviral vectors and 
their uses are described in many publications including 
European Patent Application EPA 0 178 220, U.S. Patent 
4,405,712, Giiboa (1986), Biotechniques 4:504-512, Mann, et 
25 ai - (1983), Cell 33:153-159, Cone and Mulligan (1984), Proc . 

Natl. Acad. Sci . USA 8 1:6349-6353, Eglitis, M.A, et al . (1988) 
Biotechniqrues 6:608 -614, Miller, A . D . et al . (1989) 
Biotechniques 7:981 - 990, Miller, A. D. (1992) Nature, supra, 
Mulligan, R.C. (1993}, supra, and Gould, B. et al . , and 
30 International Patent Application No. WO 92/07943 entitled 

"Retroviral Vectors Useful in Gene Therapy." The teachings of 
these patents and publications are incorporated herein by 
reference . 

The retroviral vector particles are prepared by 
35 recombinant ly inserting the nucleic acid encoding engineered 

GFP or BFP into a retrovirus vector and packaging the vector 
with retroviral capsid proteins by use of a packaging cell 
line. The resultant retroviral vector particle is incapable 
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of replication in the host cell and is capable of integrating 
into the host cell genome as a proviral sequence containina 
the engineered GFP or BFP nucleic acid. As a result, the 
patient is capable of producing engineered GFP or BFP and 
metabolize glycogen to completion. 

Packaging cell lines are used to prepare the 
retroviral vector particles. A packaging cell line is a 
genetically constructed mammalian tissue culture cell line 
that produces the necessary viral structural proteins required 
for packaging, but which is incapable of producing infectious 
virions. Retroviral vectors, on the other hand, lack the 
structural genes but have the nucleic acid sequences necessary 
for packaging. To prepare a packaging ceil line, an 
infectious clone of a desired retrovirus, in which the 
packaging site has been deleted, is constructed. Cells 
comprising this construct will express all structural proteins 
but the introduced DNA will be incapable of being packaged. 
Alternatively, packaging cell lines can be produced by 
transforming a cell line with one or more expression piasmids 
encoding the appropriate core and envelope proteins. In these 
cells, the gag, pol , and env genes can be derived from the 
same or different retroviruses . 

A number of packaging cell lines suitable for the 
present invention are available m the prior art. Examples of 
these cell lines include Crip, GPE86, PA317 and PG13 . See 
Miller et al . (1991), J. Virol. 6 5:2220-2224, which is 
incorporated herein by reference. Examples of other packaging 
cell lines are described in Cone, R. and Mulligan, R.C. 
(1984), Proceedings of the National Academy of Sciences, 
U.S.A., 81:6349-6353 and in Danos, O. and R.C. Mulligan 
(1988), Proceedings of the National Academy of Sciences, 
U.S.A., 85:6460-6464, Eglitis, M . A , et al . (1988) 
Biotechniques 6:608-614, also all incorporated herein by 
reference . 

Packaging cell lines capable of producing retroviral 
vector particles with chimeric envelope proteins may be used. 
Alternatively, amphotropic or xenotropic envelope proteins, 
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such as those produced by PA317 and GPX packaging ceil lines 
may be used to package the retroviral vectors. 

Transforming cells with nucleic acids can involve, 
for example, incubating the cells with viral vectors (e.g., 
retroviral or adeno - associa t ed viral vectors) containing with 
cells within the host range of the vector. See, e.g., Methods 
in Enzymology, Vol. 185, Academic Press, Inc., San Diego, CA 
(D.V. Goeddel, ed . ) {1990} or M. Krieger (1990), Gene Transfer 
and Expression A Laboratory Manual, Stockton Press, New 
York, NTY , and the references cited therein. 

5. Transformation with adeno- associated virus 

Adeno associated viruses (AAVs) require helper 
viruses such as adenovirus or herpes virus to achieve 
productive infection. In the absence of helper virus 
functions, AAV integrates (site- specifically) into a host 
cell's genome, but the integrated AAV genome has no pathogenic 
effect. The integration step allows the AAV genome to remain 
genetically intact until the host is exposed to the 
appropriate environmental conditions (e.g., a lytic helper 
virus), whereupon it re-enters the lytic life-cycle. Samulski 
(1993), Current Opinion in Genetio and Development 3:74-80 and 
the references cited therein provides an overview of the AAV 
life cycle . 

AAV-based vectors are used to transduce cells with 
target nucleic acids, e.g., in the in vitro production of 
nucleic acids and peptides, and in in vivo and ex vivo gene 
therapy procedures. See, West et al . (1987), Virology 160:38- 
47; Carter et al . (1989) U.S. Patent No. 4,797,368; Carter ec 
al. (1993), WO 93/24641; Kotin (1994), Human Gene Therapy 
5:793-801; Muzyczka (1994), J . Clin. Invest. 94:1351 and 
Samulski (supra) for an overview of AAV vectors. 

Recombinant AAV vectors ( r AAV vectors) deliver 
foreign nucleic acids to a wide range of mammalian cells 
(Hermonat 6c Muzycka (1984), Proc . Natl. Acad. Scz . USA 
81:6466-6470; Tratschm et al . (1985), Mol . Cell Biol. 
5:3251-3260), integrate into the host chromosome (Mclaughlin 
et al. (1988), J. Virol. 6 2:1963-1973), and show stable 
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expression of the transgene in cell and animal models (Flotte 
et ai . (1993), Proc . Natl. Acad. Sci . C/SA 9 0:10613-10617). 
Moreover, unlike some retroviral vectors, r AAV "vectors are 
able to infect non-dividing cells (Podsakoff et aJ . (1994), J. 
Virol. 68:5656-66; Flotte et al . (1994), Am. J. Respir, Cell 
Mol. Biol. 11:517-521). Further advantages of rAAV vectors 
include the lack of an intrinsic strong promoter, thus 
avoiding possible activation of downstream cellular sequences, 
and their naked eicosahedral capsid structure, which renders 
them stable and easy to concentrate by common laboratory 
techniques. rAAV vectors are used to inhibit, e.g., viral 
infection, by including anti -viral transcription cassettes in 
the rAAV vector which comprise an inhibitor of the invention. 

6. Expression in recombinant vaccinia virus- 
infected cells 

The nucleic acid encoding engineered GFP or BFP is 
inserted into a plasmid designed for producing recombinant 
vaccinia, such as pGS62, Langford, C . L, . et al . (1986), Mol. 
Cell. Biol. 6:3191-3199. This plasmid consists of a cloning 
site for insertion of foreign nucleic acids, the P7 . 5 promoter 
of vaccinia to direct synthesis of the inserted nucleic acid, 
and the vaccinia TK gene flanking both ends of the foreign 
nucleic acid. 

When the plasmid containing the engineered GFP or 
BFP nucleic acid is constructed, the nucleic acid can be 
transferred to vaccinia virus by homologous recombination in 
the infected cell. To achieve this, suitable recipient cells 
are transfected with the recombinant plasmid by standard 
calcium phosphate precipitation techniques into cells already 
infected with the desirable strain of vaccinia virus, such as 
Wyeth, Lister, WR or Copenhagen. Homologous recombination 
occurs between the TK gene in the virus and the flanking TK 
gene sequences in the plasmid. This results in a recombinant 
virus with the foreign nucleic acid inserted into the viral TK 
gene, thus rendering the TK gene inactive. Cells containing 
recombinant viruses are selected by adding medium containing 
5-bromodeoxyundme, which is lethal for ceils expressing a TK 
qene . 
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Confirmation of production of recombinant virus is 
achieved by DNA hybridization using cDNA encoding the 
engineered GFP or BFP and by immunodetection techniques using 
antibodies specific for the expressed protein. Virus stocks 
may be prepared by infection of cells such as HeLA S3 spinner 
cells and harvesting of virus progeny. 

7 . Expression in cell cultures 

GFP- or BFP-encoding nucleic acids can be ligated to 
various expression vectors for use in transforming host ceil 
cultures. The culture of cells used in conjunction with the 
present invention is well known in the art. Freshney (1994) 
(Culture of Animal Cells, a Manual of Basic Technique , third 
edition Wiley-Liss, New York), Kuchler et al . (1977) 
Biochemical Methods in Cell Culture and Virology, Kuchler, 
R.J., Dowden, Hutchinson and Ross, Inc., and the references 
cited therein provides a general guide to the culture of 
cells. Illustrative cell cultures useful for the production 
of recombinant proteins include cells of insect or mammalian 
origin. Mammalian cell systems often will be in the form of 
monolayers of cells, although mammalian cell suspensions are 
also used. Illustrative examples of mammalian cell lines 
include monocytes, lymphocytes, macrophage, VERO and HeLa 
cells, Chinese hamster ovary (CHO) cell lines, W138, BHK , 
Cos-7 or MDCK cell lines (see, e.g., Freshney, supra). 

Cells of mammalian origin are illustrative of cell 
cultures useful for the production of the engineered GFP or 
BFP . Mammalian cell systems often will be in the form of 
monolayers of cells although mammalian cell suspensions may 
also be used. Illustrative examples of mammalian cell lines 
include VERO and HeLa cells, Chinese hamster ovary (CHO) cell 
lines, WI38, BHK, COS-7 or MDCK cell lines. 

As indicated above, the vector, e.g., a plasmid, 
which is used to transform the host cell, preferably contains 
DNA sequences to initiate transcription and sequences to 
control the translation of the engineered GFP or BFP nucleic 
acid sequence . These sequences are referred to as expression 
control sequences. Illustrative expression control sequences 
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dre obtained rrom the SV-40 promoter (Science 222:524-527, 
(1983)5, the CMV i.e. Promoter ( Proc . Natl. Acad. Sci . 
61:659 - 663, (1984)) or the metaliothionem promoter (Nature 
296:39-42, -19 8 2)}. Tne cloning vector containing the 
5 expression control sequences is cleaved using restriction 
enzymes and adjusted in size as necessary or desirable and 
Iigated with sequences encoding the engineered GFP or BFP 
protein by means well known in the art. 

The vectors for transforming cells in culture 
10 typically contain gene sequences to initiate transcription and 
translation of the engineered GFP or BFP gene. These 
sequences need zo be compatible with the selected host cell. 
In addition, r he vectors preferably contain a marker to 
provide a pnenc:yp:c trait for selection of transformed host 
15 ceils dmydrof olate reductase or metallothionein . 

Additional I y, .i vector might contain a replicative origin. 

As mentioned above, when higher animal host cells 
are employed, poiyadenlyat ion or transcription terminator 
sequences trom known mammalian genes need to be incorporated 
20 into the vector. An example of a terminator sequence is the 
polyadenyiacion sequence from the bovine growth hormone gene. 
Sequences for accurate splicing of the transcript may also be 
included. An example of a splicing sequence is the VPl mtron 
from SV40 tSprague, J. et al . (1983), J. Virol. 45: 773-781). 
25 Additionally gene sequences to control replication 

m the host cell may be incorporated into the vector such as 
those found in bovine papilloma virus type - vectors . 
Savena-Campo, M . (1985), "Bovine Papilloma virus DNA a 
Eukaryotic Cloning Vector" in DNA Cloning Vol.11 a Practical 
Approach Ed. D.M. Glover, IRL Press, Arlington, Virginia pp. 
213-238 . 

The transformed cells are cultured by means well 
known m the art. For example, as published in Kuchler, R.J. 
et al., (1977), Biochemical Methods in Cell Culture and 
3 5 Virology . 

In addition to the above general procedures which 
can be used for preparing recombinant DNA molecules and 
transformed unicellular organisms in accordance with the 



30 



9/42320A1 1 



WO 97/42320 



PCT/LS97/07625 



3 0 

practices of this invention, other known techniques and 
modifications thereof can be used in carrying out the practice 
of the invention. Any known system for expression of isolated 
genes is suitable for use in the present invention. For 
example, viral expression systems such as the bacculovirus 
expression system are specifically contemplated within the 
scope of the invention. iMany recent U.S. patents disclose 
plasmids, genetically engineering microorganisms, and methods 
of conducting genetic engineering which can be used in the 
practice of the present invention. For example, U.S. Pat. No. 
4,273,875 discloses a plasmid and a process of isolating the 
same. U.S. Pat. No. 4,304,863 discloses a process for 
producing bacteria by genetic engineering in which a hybrid 
plasmid is constructed and used to transform a bacterial host. 
U.S. Pat. No. 4,419,450 discloses a plasmid useful as a 
cloning vehicle in recombinant DNA work. U.S. Pat. No. 
4,362,867 discloses recombinant cDNA construction methods and 
hybrid nucleotides produced thereby which are useful in 
cloning processes. U.S. Pat. No. 4,403,036 discloses genetic 
reagents for generating plasmids containing multiple copies of 
DNA segments. U.S. Pat. No. 4,363,877 discloses recombinant 
DNA transfer vectors. U.S. Pat. No. 4,356,270 discloses a 
recombinant DNA cloning vehicle and is a particularly useful 
disclosu re for those with limited experience in the area of 
genetic engineering since it defines many of the terms used m 
genetic engineering and the basic processes used therein. 
U.S. Pat. No. 4,336,336 discloses a fused gene and a method cf 
making the same. U.S. Pat. No. 4,319,629 discloses plasmid 
vectors and the production and use thereof. U.S. Pat. No. 
4,332,901 discloses a cloning vector useful in recombinant 
DNA. Although some of these patents are directed to the 
production of a particular gene product that is not within the 
scope of the present invention, the procedures described 
therein can easily be modified to the practice of the 
invention described in this specification by those skilled in 
the art cf genetic engineering. Transferring the isolated GFP 
cDNA to other expression vectors will produce constructs which 
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improve the expression of the GFP polypeptide in E. coli or 
express GFP in other hosts. 

111 • Detection of GFP and BFP n u cleic Acids and Proteins 

A. General detection methods 

The nucleic acids and proteins of the invention are 
detected, confirmed and quantified by any of a number of means 
well known to those of skill in the art. The unique quality 
of the inventive expressed proteins here is that they provide 
an enhanced fluorescence which can be readily and easily 
observed. Fluorescence assays for the expressed proteins are 
described in detail below. other general methods for 
detecting both nucleic acids and corresponding proteins 
include analytic biochemical methods such as 

spectrophotometry, radiography, electrophoresis, capillary 
electrophoresis, high performance liquid chromatography 
(HPLC) , thin layer chromatography (TLC) , hyperdif fusion 
chromatography, and the like, and various immunological 
methods such as fluid or gel precipitin reactions, 
immunodiffusion (single or double), Immunoelectrophoresis, 
radioimmunoassays (RIAs) , enzyme- linked immunosorbent assays 
(ELISAs) , immunof luorescent assays, and the like. The 
detection of nucleic acids proceeds by well known methods such 
as Southern analysis, northern analysis, gel electrophoresis, 
PCR, radiolabelmg, scintillation counting, and affinity 
chromatography . 

A variety of methods of specific DNA and RNA 
measurement using nucleic acid hybridization techniques are 
known to those of skill in the art. For example, one method 
for evaluating the presence or absence of engineered GFP or 
BFP DNA in a sample involves a Southern transfer. Southern oc 
ai. (1975), J. Mai. Biol. 98:503. Briefly, the digested 
genomic DNA is run on agarose slab gels in buffer and 
35 transferred to membranes. Hybridization is carried out using 
the probes discussed above. Visualization of the hybridized 
portions allows the qualitative determination of the presence 
or absence cf engineered GFP or BFP genes. 
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Similarly, a Northern transfer may be used for the 
detection of engineered GFP or BFP mRNA in samples of RNA from 
cells expressing the engineered GFP or BFP gene"."" In brief, 
the mRNA is isolated from a given cell sample using an acid 
5 guar, ici mum -phenol - chloroform extraction method. The mRNA is 

then ciectrophoresed to separate the mRNA species and the mRNA 
is transferred from the gel to a nitrocellulose membrane. As 
with the Southern blots, labeled probes are used to identify 
the presence or absence of the engineered GFP or BFP 
10 transcript . 

The selection of a nucleic acid hybridization format 
is not critical. A variety of nucleic acid hybridization 
formats arc known to those skilled in the art. For example, 
common : crmats include sandwich assays and competition or 
15 displacement assays. Hybridization techniques are generally 

describee m 'Nucleic Acid Hybridization, A Practical 
Approach.^ Ed. Hames, B.D. and Higgms, S.J., IRL Press, 1985; 
Gall and Pardue (1969), Proc . Natl. Acad. Sci . USA 63:378-383; 
and John, Burnsteil and Jones (1969), Nature 223:582-587. 
20 For example, sandwich assays are commercially useful 

hybridization assays for detecting or isolating nucleic acid 
sequences. Such assays utilize a "capture" nucleic acid 
covalently immobilized to a solid support and labelled 
"signal" nucleic acid in solution. The clinical sample will 
25 provide the target nucleic acid. The "capture" nucleic acid 

and "signal" nucleic acid probe hybridize with the target 
nucleic acid to form a "sandwich" hybridization complex. To 
be effective, the signal nucleic acid cannot hybridize with 
the capture nucleic acid. 
3° The nucleic acid sequences used in this invention 

can be either positive or negative probes. Positive probes 
bind to their targets and the presence of duplex formation is 
evidence of the presence of the target. Negative probes fail 
to bind to the suspect target and the absence of duplex 
35 formation is evidence of the presence of the target. For 

example, the use of a wild type specific nucleic acid probe or 
PCR primers may act as a negative probe in an assay sample 
where only the mutant engineered GFP or BFP is present . 
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Labelled signal nucleic acids, whether chose 
described herein or ethers known m the art are used to detect 
hybridization. Complementary nucleic acids or- signal nucleic 
acids may be labelled by any one of several methods typically 
used to detect the presence of hybridized polynucleotides. 
One common method of detection is the use of autoradiography 

with 3 H, 125 I, 35 S# 14 C/ Qr 32 p . labelled probes Qr like _ 

Other labels include iigands which bind to labelled 
antibodies, f luorophores , chemi luminescent agents, enzymes, 
and antibodies which can serve as specific binding pair 
members for a labelled ligand. 

Detection of a hybridization complex may require the 
binding of a signal generating complex to a duplex of target 
and probe polynucleotides or nucleic acids. Typically, such 
IS binding occurs through ligand and anti -ligand interactions as 
between a 1 igand - con] uga t ed probe and an anti -ligand 
conjugated with a signal. The binding of the signal 
generation complex is also readily amenable to accelerations 
by exposure to ultrasonic energy. 
20 The iabel may also allow indirect detection of the 

hybridization complex. For example, where the label is a 
hapten or antigen, the sample can be detected by using 
antibodies. In these systems, a signal is generated by 
attaching fluorescent or enzyme molecules to the antibodies or 
25 in some cases, by attachment to a radioactive label. 
(Tijssen, F. (1985), "Practice and Theory of Enzyme 
Immunoassays," Laboratory Techniques in Biochemistry and 
Molecular Biology, Burdon, R.H., van Knippenberg, P.H., Eds., 
Elsevier, pp. 9-20.) 

The sensitivity of the hybridization assays may be 
enhanced through use of a nucleic acid amplification system 
which multiplies the target nucleic acid being detected. In 
vitro amplification techniques suitable for amplifying 
sequences for use as molecular probes or for generating 
nucleic acid fragments for subsequent subclonmg are known. 
Examples of techniques sufficient to direct persons of skill 
through such in vitro amplification methods, including the 
polymerase chair; reaction (PCR) the iigase chain reaction 
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(LCR), Q0- replicase amplification and ether RNA polymerase 
mediated techniques (e.g., NASBA) are found in Berger, 
SambrooK, and Ausubei , as well as Mullis et ai . M987), U.S. 
Patent No. 4,683,202; PCR Protocols A Guide to Methods and 
5 Applications (Innis et al . , eds) Academic Press Inc. San 
Diego, CA (1990) (Innis); Arnneim & Levinson (October 1, 
1990), Chem, Eng. News 36-47; J. NIH Res. (1991) 3:81-94; 
(Kwoh et al . (1989), Proc . Wat J. Acad. Sci . USA 86:1173; 
Guatelli et al . (1990), Proc. Natl. Acad. Sci. USA 87:1874; 
10 Lomell et al . (1989), J . Clin. Chem. 35:1826; Landegren et al . 

(1988), Science 2 4 1:10 77-1080; Van Brunt (1990), Biotechnology 
8:291-294; Wu and Wallace (1989), Gene 4:560; Barringer et al . 
(1990), Gene 89:117, and Sooknanan and Malek (1995), 
Biotechnology 13:563-564. Improved methods of cloning 
15 in vitro amplified nucleic acids are described in Wallace et 
al . , U.S. Pat. No. 5,426,039. Other methods recently 
described in the art are the nucleic acid sequence based 
amplification (NASBA™, Cangene , Mississauga, Ontario) and Q 
Beta Replicase systems. These systems can be used to directly 
20 identify mutants where the PCR or LCR primers are designed to 

be extended or ligated only when a select sequence is present. 
Alternatively, the select sequences can be generally amplified 
using, for example, nonspecific PCR primers and the amplified 
target region later probed for a specific sequence indicative 
25 of a mutation. 

Oligonucleotides for use as probes, e.g., in in 
vitro amplification methods, for use as gene probes, or as 
inhibitor components are typically synthesized chemically 
according to the solid phase phosphoramidi t e triester method 
30 described by Beaucage and Caruthers (1981), Tetrahedron Letts. 
22(20) : 1859-1862, e.g., using an automated synthesizer, as 
described in Needham- VanDevanter et al . (1984), Nucleic Acids 
Res. 12:6159-6168. Purification of oligonucleotides, where 
necessary, is typically performed by either native acrylamide 
35 gel electrophoresis or by anion - exchange HPLC as described in 
Pearson and Regnier C1983), J. Chrom. 255:137-149. The 
sequence cf the synthetic oligonucleotides can be verified 
using the chemical degradation method of Maxam and Gilbert 
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(1980) in Grossman and Moldave (eds.) Academic Press, New 

York, Methods in Enzymology 65:499-560. 

An alternative means for determining the level of 
expression of the engineered GFP or BFP gene is in situ 
hybridization. In situ hybridization assays are well known 
and are generally described in Angerer et al. (1987} , Methods 
Enzymol. 152:649-660. In an in situ hybridization assay cells 
are fixed to a solid support, typically a glass slide. If DNA 
is to be probed, the cells are denatured with heat or alkali. 
The cells are then contacted with a hybridization solution at 
a moderate temperature to permit annealing of engineered GFP 
or BFP specific probes that are labelled. The probes are 
preferably labelled with radioisotopes or fluorescent 
reporters . 

B. Fluorescence Assay 

When a fluorophore such as protein that is capable 
of fluorescing is exposed to a light of appropriate 
wavelength, it will absorb and store light and then release 
the stored light energy. The range of wavelengths that a 
fluorophore is capable of absorbing is the excitation spectrum 
and the range of wavelengths of light that a fluorophore is 
capable of emitting is the emission or fluorescence spectrum. 
The excitation and fluorescence spectra for a given 
fluorophore usually differ and may be readily measured using 
known instruments and methods. For example, scintillation 
counters and photometers (e.g. luminome t ers ) , photographic 
film, and solid state devices such as charge coupled devices, 
may be used to detect and measure the emission of light. 

The nucleic acids, vectors, mutant proteins provided 
herein, m combination with well known techniques for over- 
expressing recombinant proteins, make it possible to obtain 
unlimited supplies of homogeneous mutant GFPs and BFPs . These 
modified GFPs or BFPs having increased fluorescent activity 
replace wtGTP or other currently employed tracers in existing 
diagnostic and assay systems. Such currently employed tracers 
include radioactive atoms or molecules and color - producing 
enzymes such as horseradish peroxidase. 
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The benefits of using the mutants cf the present 
invention are at least four-fold: the modified GFPs and BFPs 
are safer than radioactive-based assays, modifier} GFPs and 
3FPs can be assayed quickly and easily, and large numbers of 
l samples can be handled simultaneously, reducing overall 

handling and increasing efficiency . Of great significance , 
the expression and subcellular distribution of the fluorescent 
proteins within cells can be detected in living tissues 
without any other experimental manipulation than to placing 
10 the cells on a slide and viewing them through a fluorescence 
microscope. This represents a vast improvement over methods 
of immunodetection that require fixation and subsequent 
labelling . 

The modified GFPs and BFPs of the present invention 

15 can be used in standard assays involving a fluorescent marker. 

For example, 1 igand - 1 iga t or binding pairs that can be modified 
with the mutants of the present invention without disrupting 
the ability of each to bind to the other can form the basis of 
an assay encompassed by the present invention. These and 

20 other assays are known in the art and their use with the GFPs 
and BFPs of the present invention will become obvious to one 
skilled in the art in light of the teachings disclosed herein. 
Examples of such assays include competitive assays wherein 
labeled and unlabeled ligands competitively bind to a ligator, 

25 noncompetitive assay where a ligand is captured by a ligator 
and either measured directly or "sandwiched" with a secondary 
ligator that is labeled. Still other types of assays include 
immunoassays, single-step homogeneous assays, multiple-step 
heterogeneous assays, and enzyme assays. 

30 In a number of embodiments, the mutant GFPs and BFPs 

are combined with fluorescent microscopy using known 
techniques (see, e.g., Stauber et ai. , Virol. 213:439-454 
(1995)) or preferably with fluorescence activated cell sorting 
(FACS) to detect and optionally purify or clone cells that 

35 express specific recombinant constructs. For a brief overview 
of the FACS and its uses, see: Herzenberg et al . , 1976, 
"Fluorescence activated cell sorting", Sci . Amer . 234, 108; 
see also Flow Cytometry and Sorting , eds . Meiamad , Mul laney and 
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Mendelsohn, John Wiley and Sons, Inc., New York, 1979). 
Briefly, fluorescence activated cell sorters take a suspensio 
of cells and pass them single file into the light path of a 
laser placed near a detector. The laser usually has a set 
wavelength . The detector measures the fluorescent emission 
intensity of each cell as it passes through the instrument an 
generates a histogram plot of cell number versus fluorescent 
intensity. Gates or limits can be placed on the histogram 
thus identifying a particular population of cells. In one 
embodiment, the cell sorter is set up to select cells having 
the highest probe intensity, usually a small fraction of the 
cells in the culture, and to separate these selected cells 
away from all the other cells. The level of intensity at 
which the sorter is set and the fraction of cells which is 
selected, depend on the condition of the parent culture and 
the criteria of the isolation. In general, the operator 
should first sort an aliquot of the culture, and record the 
histogram of intensity versus number of cells. The operator 
can then set the selection level and isolate an appropriate 
number of the most active cells. Currently, fluorescence 
activated cell sorters are equipped with automated cell 
cloning devices. Such a device enables one to instruct the 
instrument to singly deposit a selected cell into an 
individual growth well, where it is allowed to grow into a 
monoclonal culture. Thus, genetic homogeneity is established 
within the newly cloned culture . 

IV - Genera l Applications for the GFP Mutants 

It should be self-evident that the mutant GFP and 
BFP sequences described here have unlimited uses, particularly 
as signal or reporter sequences for the co-expression of other 
nucleic acid sequences of interest and/or to track the 
location and/or movement of other sequences within the cell, 
within tissue and the like. For example, these reporter type 
sequences could be used to track the spread (or lack thereof) 
of a disease causal agent m drug screening assays or could 
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readily be used in diagnostics. Some of the more interesting 
applications are described below. 

A* Protein Trafficking 

Normally, expressed mutant GFPs and BFPs are 
distributed throughout the cell (particularly mammalian 
cells), except for the nucleolus. However, as described 
below, when a GFP mutant is fused to the HIV-1 Rev protein, a 
hybrid molecule results which retains the Rev function and is 
localized mainly in the nucleolus where Rev is found. Fusion 
to the N-terminal domain of the HIV-1 Nef protein produces a 
hybrid protein detectable in the plasma membrane. Thus, the 
GFP mutants can be used to monitor the subcellular targeting 
and transport of proteins to which they are fused. 

B. Gene Therapy 

The mutant GFPs described here have interesting and 
useful applications in gene therapy. Gene therapy in general 
is the correction of genetic defects by insertion of exogenous 
cellular genes that encode a desired function into cells that 
lack that function, such that the expression of the exogenous 
gene a) corrects a genetic defect or b) causes the destruction 
of cells that are genetically defective. Methods of gene 
therapy are well known in the art, see, for example, Lu , M., 
ec al. (1994), Human Gene Therapy 5:203; Smith, C. (1992), J. 
Hematotherapy 1:155; Cassel, A., et al . (1993), Exp . Hematol . 
21-:585 (1993); Larrick, J.W. and Burck, K.L., Gene Therapy: 
Application of Molecular Biology, Elsevier Science Publishing Co., 
Inc., New York, New York (1991) and Kreigler, M. Gene Transfer 
and Expression: A Laboratory Manual, W.H. Freeman and Company, New 
York (1990) , each incorporated herein by reference. One 
modality of gene therapy involves (a) obtaining from a patient 
a viable sample of primary cells of a particular cell type; 
(b) inserting into these primary cells a nucleic acid segment 
encoding a desired gene product; (c) identifying and isolating 
cells and cell lines that express the gene product; (d) re- 
introducing cells that express the gene product; (e) removing 
from the patient an aliquot of tissue including cells 
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resulting from step c and their progeny; and (f) determining 
the quantity of the cells resulting from step c and their 
progeny, in said aliquot. The introduction into cells in step 
c of a polycistronic vector that encodes GFP or BFP in 
addition to the desired gene allows for the quick 
identification of viable cells that contain and express the 
desired gene. 

Another gene therapy modality involves inserting the 
desired nucleic acid into selected tissue cells in situ, for 
example into cancerous or diseased cells, by contacting the 
target cells in situ with retroviral vectors that encode the 
gene product in question. Here, it is important to quickly 
and reliably assess which and what proportion of cells have 
been transf ected . Co - express ion of GFP and BFP permits a 
15 quick assessment of proportion of ceils that are transfected, 
and levels of expression. 

C. Diagnostics 

One potential application of the GFP/BFP variants is 
20 in diagnostic testing. The GFP/BFP gene, when placed under 

the control of promoters induced by various agents, can serve 
as an indicator for these agents. Established cell lines or 
cells and tissues from transgenic animals carrying GFP/BFP 
expressed under the desired promoter will become fluorescent 
25 m the presence of the inducing agent. 

Viral promoters which are transact lvated by the 
corresponding virus, promoters of heat shock genes which are 
induced by various cellular stresses as well as promoters 
which are sensitive to orgamsmal responses, e.g. 
30 inflammation, can be used in combination with the described 
GFP/BFP mutants in diagnostics. 

In addition, the effect of selected culture 
conditions and components (salt concentrations, pH, 
temperature, trans-acting regulatory substances, hormones, 
cell-cell contacts, ligands of cell surface and internal 
receptors) can be assessed by incubating ceils in which 
sequences encoding the fluorescent proteins provided herein 
are operably linked to nucleic acids (especially regulatory 
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elements such as promoters) derived from a selected gene, and 
detecting the expression and location of fluoresence. 

D. Toxicology 

Another application of the GFP/3FP - based 
methodologies is in the area of toxicology. Assessment of the 
mutagenic potential of any compound is a prerequisite for its 
use. Until recently, the Ames assay in Salmonella and tests 
based on chromosomal aberrations or sister chromatid exchanges 
in cultured mammalian cells were the main tools in toxicology. 
However, both assays are of limited sensitivity and 
specificity and do not allow studies on mutation induction in 
various organs or tissues of the intact organism. 

The introduction of transgenic mice with a 
mutational target in a shuttle vector has made possible the 
detection of induced mutations in different tissues in vivo. 
The assay involves DNA isolation from tissues of exposed mice, 
packaging of the target DNA into bacteriophage lambda 
particles and subsequent infection of E. coli . The mutational 
target in this assay is either the lacZ or lad genes and 
quantitation of blue vs white plaques on the bacterial lawn 
allows for mutagenic assessment . 

GFP/BFP could significantly simplify both the tissue 
culture and transgenic mouse procedures. Expression of 
GFP/BFP under the control of a repressor, which in turn is 
driven by the promoter of a cons t i tut lve ly expressed gene, 
will establish a rapid method for evaluating the mutagenic 
potential of an agent. The presence of fluorescent cells, 
following exposure of a cell line, tissue or whole animal 
carrying the GFP/BFP - based detection construct, will reflect 
the mutagenicity of the compound in question. GFP/BFP 
expressed under the control of the target DNA , the repressor 
gene, will only be synthesized when the repressor is 
inactivated or turned off or the repressor recognition 
sequences are mutated. Direct visualization of the detector 
ceil line or tissue biopsy can qualitatively assess the 
mutagenicity of the agent, while FACS of the dissociated cells 
can provide for quantitative analysis. 
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E. Drug Screening 

The GFP/BFP detection system could also 
significantly expedite and reduce the cost of some current 
drug screening procedures. A dual color screening system 
(DCSS), in which GFP is placed under the promoter of a taraet 
gene and 3FP is expressed from a constitutive promoter, could 
provide for rapid analysis of agents that specifically affect 
the target gene. Established cell lines with the DCSS could 
be screened with hundreds of compounds in few hours. The 
desired drug will only influence the expression of GFP. 
Non-specific or cytotoxic effects will be detected by the 
second marker. BFP. The advantages of this system are that nc 
exogenous substances are required for GFP and BFP detection, 
the assay can De used with single cells, cell populations, or 
cel - - x:rJC '- <*nu that the same detection technology and 
instrumentation is used for very rapid and non-destructive 
detection . 

The search for antiviral agents which specifically 
block viral transcription without affecting cellular 
transcription, could be significantly improved by the DCSS. 
In the case of HIV, appropriate cell lines expressing GFP 
under the HIV LTR and BFP under a cellular constitutive 
promoter, could identify compounds which selectively inhibit 
HIV transcription. Reduction of only the green but not the 
blue fluorescent signal will indicate drug specificity for the 
HIV promoter. Similar approaches could also be designed for 
other viruses. 

Furthermore, the search for antiparasitic agents 
could also oe helped by the DCSS . Established cell lines or 
transgenic nematodes or even parasitic extracts where 
expression of GFP depends on parasite-specific trans splicing 
sequences while BFP is under the control of host -specif ic cis 
splicing elements, could provide for rapid screen of selective 
ant iparas 1 tic drugs . 

The invention will be more readily understood by 
reference r.c the following specific examples which are 
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included for purposes of illustration only and are not 
intended to limit the invention unless so stated. 

EXAMPLES 

The following general protocol was used to generate 
mutant GFP- or BFP-encoding nucleic acids, transform host 
cells, and express the mutant GFP and BFP proteins: 

• Clone a nucleic acid that encodes either wtGFP or 
BFP (Tyr 67 -*His ) , under the control of eukaryotic or 
prokaryotic promoters, into a standard ds-DNA plasmid 

• Convert the plasmid vector to a ss-DNA by standard 
methods 

• Anneal the ss-DNA to 40-50 nucleotide DNA oligomers 
having base mismatches at the site (s) intended to be 
engineered 

• Convert the ss-DNA to a closed ds-DNA plasmid vector by 
use of DNA polymerase and standard protocols 

• Identify plasmids containing the desired mutations by 
restriction analysis following plasmid DNA isolation from 
E. coli strains transformed with the mutagenized DNA 

• verify the presence of mutations by DNA sequencing 

• transfect human transformed embryonic kidney 293 cells 
with equal amounts of DNA from the appropriate plasmids 

• compare the fluorescence intensity of the signals 

Nucleic acids and vectors 

The wtGFP cDNA { SEQ ID NO : 1 ) was obtained from Dr. 
Chalfie of Columbia University. All mutants described were 
obtained by modifying this wtGFP sequence as detailed below. 

The vectors used to clone and to express the GFPs 
and BFPs are derivatives of the commercially available 
plasmids pcDNA3 ( Invitrogen , San Diego , CA) , pBSSK+ 
(Stratagene, La Jolla, CA) and pETlla (Novagen, Madison, WD. 
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wtGFP prote in expression in mammalian cells 

Several vectors for the expression of GFP in 
mammalian cells were constructed : 

pFRED4 carries the wtGFP sequences under the control of the 
cytomegalovirus CCMV) early promoter and the polyadenylat ion 
signal cf the Human Immunodeficiency Virus - 1 (HIV) 3 1 Long 
Terminal Repeat (LTR) . To derive pFRED4 we amplified the GFP 
coding sequence from plasmid #TU58 (Chalfie et al . , 1994) by 
the polymerase chain reaction (PCR) . For PCR amplification cf 
the GFP coding region, oligonucleotides #16417 and #16413 were 
used as primers. Oligonucleotide #16417: 

5' - GGAGGCGCGCAAGAAATGGCTAGCAAAGGAGAAGA- 3 * (SEQ ID NO : 3 ) , 
containing the BssHII recognition sequence and the translation 
initiation sequence of the HIV-l Tat protein, was the sense 
primer. The antisense primer, #16418: 

5 * -GGGGGATCCTTATTTGTATAGTTCATCCATGCCATG-3 1 (SEQ I D NO : 4 ) 
contained the BamHI recognition sequence. The amplified 
fragment was digested with BssHII and BamHI and cloned into 
BssHII and BamHI digested pGMV3 7M1 - 1 0D , a plasmid containing 
the CMV early promoter and the HIV-l p37gag region, followed 
by several cloning sites and the HIV-l 3' LTR . Thus the 
p37gag gene was replaced by GFP , resulting in pFRED4 . 

In a second step, the 1485bp fragment from pFRED4 , 
generated from StuI and BamHI double digestion, was subcloned 
into the 4747bp vector derived from the Nrul and BamHI double 
digestion of pcDNA3 . The resulting plasmid, pFRED7 (SEQ ID 
NO:5), expresses GFP under the control of the early CMV 
promoter and the bovine growth hormone polyadenylat ion signal. 

Bacterial expression 

For bacterial expression, we constructed plasmid 
pBSGFP (SEQ ID NO : 6 ) , a pBSSK+ derivative carrying wtGFP. 
pBSGFP was generated by inserting the GFP containing region cf 
pFRED4 , digested with BamHII and BamHI and subsequently 
treated with Klenow, into the EcoRV digested pBSSK+ vector. 
In pBSGFP the wtGFP is fused downstream to the 4 3 ammo acids 
of the alpha peptide of beta galac t os idase , present in the 
pBSSJw polylinker region. The added ammo acids at the 



BNSDOCID -.WO r ^74?:??0Al I 



WO 97/42320 



PCT/US97/07625 



4 4 

^-lernmus of wtGFP have no apparent effect: on the GFP signal, 
as nudged from subsequent plasmids containing precise 
deletions of the extra amino acids. 

For GFP overexpression and purification we generated 
plasmid pFREDl 3 (SEQ ID NO : 7 } by ligatmg the 717bp fragment 
from p FRED 7 digested with Nhel and BamHI, to the 5644bp 
fragment resulting from the Nhel and BamHI double digestion of 
pETlla. In pFRED13, GFP is synthesized under the control cf 
the bacteriophage T7 philC promoter. 

The oligonucleotides used for GFP mutagenesis were 
synthesized by the DNA Support Services of the ABL Basic 
Research Program of the National Cancer Institute. DNA 
sequencing was performed by the PCR-assisted fluorescent 
terminator met ho a { Ready Re act ion DyeDeoxy Terminator Cycle 
Sequencing K;: , ABI , Columbia, MD) according to the 
manufacturer ' o instructions. Sequencing reactions were 
resolved on the A3 1 Model 373A DNA Sequencing System. 
Sequencing data were analyzed using the Sequencher program 
(Gene Coaes , Ann Arbor, MI) . 

Enzymes were purchased from New England Biolabs 
(Beverly, MA) and used according to conditions described by 
the supplier. Chemicals used for the purification cf wild 
type and mutant proteins were purchased from SIGMA (St. Louis, 
MO) . Tissue culture media were obtained from Biof luids 
(Rockville, MD) and GIBCO/BRL (Gai thersburg , MD) . Competent 
bacterial cells were purchased from GIBCO/BRL. 

Preparation of mutants 

Initially, plasmid pBSGFP was used to mutagenize the 
GFP coding sequence by s ingle - stranded DNA site directed 
mutagenesis, as described by Schwartz et ai . (1992) J. Virol. 
66:7176. In addition to changing specific codons, our 
strategy was also to improve GFP expression by replacing 
potential inhibitory nucleotide sequences without altering nhe 
GFP amino acid sequence. This approach has been successfully 
employed in the past for other proteins (Schwartz et al . 
(1992) J. Virol. 66:7176). 
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For the pBSGFP mutagenesis the following 
oligonucleotides were used: 
#17422 (SEQ ID NO: 8 ) : 

5 ' - CAATTTGTGTCGCAGAATGTTGCCATCTTCCTTGAAGTCAATACCTTT- 3 ' 
#17423 (SEQ ID NO : 9 ) : 

5 1 -GTGTTGTAGTTGCCGTCATCTTTGAAGAAGATGCTCCTTTCCTGTAC-3 ' 
#17424 (SEQ ID NO : 1 0 ) : 

5 ' -CATGGAACAGGCAGTTTGCCAGTAGTGCAGATGAACTTCAGGGTAAGTTTTC-3 ' 
#17425 (SEQ ID NO: 11) : 

5 ' -CTCCACTGACAGAGAACTTGTGGCCGTTAACATCACCATC-3 ' 
#17426 (SEQ ID NO:12) : 

5 ' - CCATCTTCAATGTTGTGGCGGGTCTTGAAGTTCACTTTGATTCCATT- 3 ' 
#17465 (SEQ ID NO:13) : 

E ' -CGATAAGCTTGAGGATCCTCAGTTGTACAGTTCATCCATGC-3 1 

Oligonucleotide #17426 introduces a mutation in GFP , 
converting the Isoleucme (lie) at position 168 into Threonine 
(Thrj . The llel68Thr change has been shown to alter the GFP 
spectrum and to also increase the intensity of GFP 
fluorescence by almost two- fold at the emission maxima (Heim 
ec al . (1994 ) , supra) . 

The mutagenesis mixture was used to transform DH5a 
competent E . coli cells. Ampicilm resistant colonies were 
obtained and examined for their fluorescent properties by 
excitation with UV light. One colony, significantly brighter 
than the rest, was apparent on the agar plate. This colony 
was further purified, the plasmid DNA was isolated and used to 
transform DH5a competent bacteria. This time all the colonies 
were bright green when excited with the UV light, indicating 
that the bright green fluorescence was associated with the 
presence of the plasmid. The sequence of the GFP segment 
(SEQ ID NO: 14, representing only the segment and not the whole 
plasmid) of this plasmid, called pBSGFPsgll, was then 
determined. The sequence analysis revealed that in addition 
to the designed nucleotide changes, which do no alter the 
amino acid sequence of GFP , and the Ilel68Thr mutation, a 
second spontaneous mutation had occurred. A thymidine at 
position 322 of SEQ ID NO:14, which is the GFP-coding region 
of the pPBSGFPsgll DNA , was replaced by a cytosme. This 
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nucleotide change converts the phenylalanine ( Phe ) at position 
65 of the GFP amino acid sequence into a leucine (Leu) . A 
series of experiments, which will be described below, 
demonstrated that indeed the Phe65Leu mutation was responsible 
for the increase in the intensity of the fluorescent GFP 
signal . 

In subsequent experiments, involving generation of 
rationally designed GFP mutant combinations to be detailed 
below, we also used the single - stranded DNA site directed 
mutagenesis approach. This time, however, the template DNAs 
were pFRED7 derivatives instead of pBSGFP . 

Transfection and expression 

The 293 cell line, an adenovirus - transformed human 
embryonal kidney cell line (Graham ec al . (1977), j. Gen. 
Virol. 5:59) was used for protein expression analysis. The 
cells were cultured in Dulbecco's modified culture medium 
(DMEM) supplemented with 10% heat - inactivated fetal bovine 
serum (FBS, Biofiuids) . 

Transfection was performed by the calcium phosphate 
coprecipitat ion technique as previously described (Graham et 
al . (1973), Virol. 52:456; Felber ec al . (1990), J . Virol. 
64:3734. Plasmid DNA was purified by Qiagen columns according 
to the manufacturer's instructions (Qiagen) . A mix of 5 to 1C 
Mg of total DNA per mi of final precipitate was overlaid on 
the cells in 60 mm or 6- and 12-well tissue culture plates 
(Falcon), using 0.5, 0.25 and 0.125 ml of precipitate, 
respectively. After overnight incubation, the cells were 
washed, placed in medium without phenol red and measured in a 
plate spectrof luorometer , e.g., Cytofluor II (Perceptive 
Biosystems, Framingham, MA. } 

Purification of wild-type and mutant proteins: 

E. coli strains carrying pFREDl 3 or other pETlla 
derivatives with mutant GFP genes were used for the 
overproduction and purification of the wt and mutant GFPs or 
BFPS . The cells were grown in 1 liter LB broth containing ICC 
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Mg/rr.I ampicillin at 32° C to a density of 0.6-0.8 optical 
density units at 600 nm . At this point, the cells were 
induced with 0.6 mM IPTG and incubated for four^more hours. 
Following harvesting of the cell pellets, cellular extracts 
were prepared as described by Johnson, 3.H and Hecht, M.H., 
19 94 , Biotechnol . 12: 13 57 . 

GFPs and BFPs were purified from the cellular 
extracts as follows: Ammonium sulfate (AS) was added first to 
the extracts (SOg AS per lOOg supernatant) to precipitate the 
proteins. The precipitants were collected by centri f ugat ion 
at 7500 x g for 15 mm and the pellets were dissolved in 5ml 
of 1 M AS . The samples were then loaded on phenylsepharose 
column (HR10/10, Pharmacia, Piscataway, NJ) and washed with 20 
mM 2- [N-morpholino] e tnanesulf onic Acid (MES) pH 5.6 and 1 M 
AS. Proteins were eluted with a 45 ml gradient to 20 mM MES , 
pH 5.6. Fractions containing the GFP or BFP protein were 
colored even under visible light. 

Green or blue-colored fractions were further 
purified on Q-sepharose (Mono Q, HR5/5, Pharmacia) with a 20 
ml gradient from 20 mM Tris pH 7.0 to 20 mM Tris pH 7.0, 0.25 
M NaCl . 

The AS precipitation step was performed at 4° C 
while the chromatographic procedures were performed at room 
temperature . 

Determination of protein concentration 

Protein concentrations were determined using the 
commercially available Bradford protein assay (BioRad, 
Hercules, CA) with bovine IgG protein as a standard. 

Analytical p olyacrylamide gels 

Analytical polyacrylamide gel electrophoresis was 
used to visualize the degree of purity of the purified GFP or 
BFP proteins. In all cases, 1 mm thick, 12% acrylamide gels 
(containing 0.1% SDS , m Tris buffer, pH 7.4) were used, and 
electrophoresis was performed for 2 hours at 120 V. Gels were 
stained with Goomassie Blue to visualize the proteins. 
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Fluorescence measurements 

Excitation and emission spectra of solutions of the 
fluorescent proteins were obtained using a Perkin* Elmer L550B 
spectrof lucnmeter iPerkin Elmer, Advanced Biosystems, roster 
City, , CA) . 

The relative fluorescence data fcr the GFP mutants 
in Table I below were obtained by comparing the cellular 
fluorescence of the GFP mutants expressed in the transformed 
human embryonic kidney cell line 293 with wt GFP expressed in 
the same cell line. Likewise, the relative fluorescence data 
for the HFP mutants m Table I below were obtained by 
comparing the cellular fluorescence of the 3FP mutants 
expressed in 293 cells with BFP ( Tyr 67 -*His } expressed in the 
same cell line. Equal amounts of DNA encoding wild type or 
mutant proteins were introduced into 293 cells. Cellular 
fluorescence was quantified 24 h or 48 hr . pos t - t rans feet ion 
using Cytcfluor II. 

A list of GFP mutant proteins indicating the 
introduced ammo acid mutations is shown in Table I. 



TABLE I: GFP and BFP mutants 



PROTEIN 


Amino Acid Position 


65 


66 


67 


164 


168 


239 


wt GFP 


F 


S 


V 


V 


I 


K 


SG12 


L 












SG11 


L 








T 


N 


SG25 


L 


C 






T 


N 


BFP 






H 








SB42 


L 




H 








SB49 






H 


A 






SB50 


L 




H 


A 







Example 1: SG12 

A number of the unique mutants described herein 
derive from the discovery of an unplanned and unexpected 
mutation called "SG12", obtained in the course of site- 
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directed mutagenesis experiments, wherein a phenylalanine at 
position S5 of wtGFP was converted to leucine. SG12 was 
prepared as follows: Two plasmids carrying SG12-(SEQ ID NO:15) 
were generated, pFRED12 for expression in mammalian cells, and 
PFRED16 for expression in E . coll and protein purification. 
PFREBI2 was constructed by ligatmg the 1557 bp fragment from 
the double digestion of pFRED7 with Avr II and Pml I into the 
4681 bp fragment generated from the Avr II and Pml I digestion 
of pFREDli (see below) . pFREDl6 was derived by subclomng the 
717bp segment resulting from the digestion of pFRED12 with 
Nhel and BamHI to the 5644bp fragment of the pETlla vector 
digested with the same restriction enzymes. 

The specific activity of SG12 was about 9-12 times 
that of wtGFP. See Table II. 

Example 2 : SG11 

A mutant referred to as "SGll, " which combined the 
phenylalanine 65 to leucine alteration with an isoleucine 168 
to threonine substitution and a lysine 239 to asparagine 
susbstitution, gave a further enhanced fluorescence intensity. 
SGll was prepared as follows: Two plasmids carrying SGll (SEQ 
ID NO: 16) were generated: pFREDli for expression in mammalian 
cells and pFRED15 for expression in E. coli and protein 
purification. pFREDli was constructed by ligating the 717bp 
region from pBSGFPsgli DNA digested with Nhel and BamHI to the 
5221bp fragment derived from the digestion of pFRED7 with the 
same enzymes. pFREDl 5 was generated by subcloning the 717bp 
segment resulting from the digestion of pFREDli with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SGll encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leucine and the conversion of isoleucine 168 to threonine. 
The additional alteration of the C-termxnal lys 239 to asn is 
without effect; the c- terminal lys or asn may be deleted 
without affecting fluorescence. The specific activity of SGll 
is about 19-38 times that of wtGFP. See Table II. 
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Example 3 : SG25 

A third and further improved GFP mutant was obtained 
by further mutating "SGll." This mutant is referred to as 
"SG2 5" and comprises, in addtion to the SGll substitutions, 
and additional substitution of a cysteine for the serine 
normally found at position 66 in the sequence. SGll was 
prepared as follows: Two plasmids carrying SG25 ( SEQ ID NO: 17} 
were generated: pFRED2S for expression in mammalian cells and 
pFRED6 3 for expression in E. coli and protein purification. 
pFRED2 5 was constructed by site directed mutagenesis of 
pFREDll, using oligonucleotide #18217 {SEQ ID NO:18): 
5 ' - CATTGAACACGATAGCACAGAGTAGTGACTAGTGTTGGCC- 3 ' . This 
oligonucleotide incorporates the Ser66Cys mutation into SGll. 
Ser66Cys had been shown to both alter the GFP excitation 
maxima without significant change in the emission spectrum and 
to also increase the intensity of the fluorescent signal of 
GFP (Heim et aJ., 1995). 

pFRED6 3 was generated by subcloning the 717 bp 
segment resulting from the digestion of pFRED2 5 with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SG25 encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leu, the conversion of isoleucine 168 to threonine and the 
conversion of serine 66 to cysteine. As with SGll, the 
additional alteration of the C-termmal lysine 239 to 
asparagme is without effect; the C- terminal lysine or 
aspragine may be deleted without affecting fluorescence. The 
specific activity of SG25 is about 56 times that of wtGFP . 
See Table II . 



Example 4 : Additional green fluorescent mutants 

Addit lonal alterations at different amino acids of 
the wtGFP, when combined with SGll and SG25, yielded proteins 
having at least 5X greater cellular fluorescence compared to 
the wtGFP. A non- limiting list of these mutations is provided 
below : 
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GFP variants with 



enhanced cellular fluorescence 



Protein 


Altered Am^ 


.no Ac ids 




SG2 0 


F65L, 


S667\ 


I168T, 


K2 3 9N 


— 


SG21 


F65L, 


S66A, 


I168T, 


K2 3 9N 




SG2 7 


Y40L, 


F65L, 


I168T, 


K2 3 9N 




SG3 0 


F4 7L , 


F65L, 


I168T, 


K239N 




SG32 


F72L, 


F65L, 


U68T, 


K2 3 9N 




SG4 3 


F65L, 


I168T, Y201L 


, K23 9N 




SG4 6 


F65L, 


V164A, I168T 


, K23 9N 




SG72 


F65L, 


S66C, 


V164A, 


I168T, 


K23 9N 


SG91 


F65L, 


S66C, 


F100L, 


I168T, 


K239N 


SG94 


F65L, 


S66C, 


Y107L, 


I168T, 


K23 9N 


SG95 


F65L, 


S66C, 


F115L, 


I168T, 


K2 3 9N 


SG96 


F65L, 


S66C, 


F131L, 


I168T, 


K23 9N 


SG98 


F65L, 


S66C, 


Y146L, 


I168T, 


K239N 


SG100 


F65L, 


S66C, 


Y152L, 


I168T, 


K239N 


SGI 01 


F65L r 


S66C, 


I168T, 


Y183L, 


K239N 


SGI 02 


F65L, 


S6GC, 


I168T, 


F224L, 


K239N 


SG103 


F65L, 


S66C, 


I168T, 


Y238L, 


K239N 


SG106 


F65L, 


S66T, 


V164A, 


I168T, 


K239N 



Example 5: SB4? 

The blue fluorescent proteins described here and 
below were derived from the known GFP mutant (Heim et al 
EMAS, 1994) wherein histidine is substituted for tyrosine at 
position 67. we have designated this known mutant 
BFP (Tyr 67 -»His ) BFP (Tyr 67 -»His ) has a shifted emission 
spectrum. it emits blue light, i.e.. it is a blue fluorescent 
protein (BFP) 

By introducing the same mutation in BFP (Tyr 67 -»His) 
that was used to generate SG12, i.e., leucine for 
Phenylalanine at position 65, we created a new mutant that has 
unexpectedly high fluorescence that we refer to as "SuperBlue- 
42" (SB42). SB42 was prepared as follows: Two plasmids 
carryxng SB42 ( SEQ ID NO: 19) were generated: pFRED42 for 
expression in mammalian cells and pFRED6 5 for expression in F 
coli and protein purification. pFRED4 2 was constructed by 
site directed mutagenesis of pFREDl 2 , using ol i aonuc l~ot ide 
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#bio25 (5-CATTGAACACCATGAGAGAGAGTAGTGACTAGTGTTGGCC-3 1 ) (SEC I D 
NO:2C) . This oligonucleotide incorporates the Tyr 67 -^His 
mutation into SG12, thus generating the Phe65Leu, Tyr 67 ^His 
double mutant . 

pFRED6 5 was created by subclonmg the 717 bp segment 
resulting from the digestion of pFRED4 2 with Nhel and BamHI to 
the 5644 op fragment of the pETlla vector, digested with the 
same restriction enzymes . 

The mutant SB42 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine and the conversion of phenylalanine 65 to leucine. 
The specific activity of SB42 is about 27 times that of 
BFP(Tyr 67 -His) . See Table II. 

Example 6 : SB4 9 

An independent mutation of BFP ( Tyr 67 -*Hi s ) which 
substitutes the valine at position 164 with an alanine is 
referred to as H SB4 9." SB49 was prepared as follows: Plasmid 
pFRED4 9 expresses SB4 9 ( SEQ ID NO : 2 1 ) in mammalian cells. 
pFRED4 9 was generated by site directed mutagenesis of pFRED12 , 
using oligonucleotides #19059 and #bio24 . Oligonucleotide 
#190 59 ( 5 * - CTTCAATGTTGTGGCGGATCTTGAAGTTCGCTTTGATTCCATTC - 3 ' ) 
(SEQ ID NO:22) introduces the Vall64Ala mutation in SG12 while 
oligonucleotide #bio24 (5'- 

CATTGAACACCATGAGAGAAAGTAGTGACTAGTGTTGGCC- 3 ' ) ( SEQ ID MO : 2 3 } 
reverts the Phe65Leu alteration to the wt sequence and, at the 
same time, incorporates the Tyr 67 ^His mutation. 

The mutant SB49 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, and the conversion of valine 164 to alanine. The 
specific activity of SB49 was about 37 times that of 
BFP (Tyr 67 -His) . See Table II. 

Example 7: SB50 

A combination of the above two BFP mutations 
resulted m "SB50, " which gave an even greater fluorescence 
enhancement than either of the previous mutations. 3350 was 
prepared as fellows: Two plasmids carrying SB50 (SEQ ID NO: 
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24) were generated: pFREDSO for expression in mammalian ceils 
and pFRED67 for expression in E. coli and protein 
purification. pFREDSO was constructed by site directed 
mutagenesis cf pFRED12 , using oligonucleotides #19059 and 
#bio25 . 

PFRED67 was created by subcloning the 717bp segment 
resulting from the digestion of pFRED50 with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector digested with the 
same restriction enzymes. 

The mutant SB50 encodes an engineered BFP wherein 
Che alterations comprise the conversion of tyrosine 67 to 
histidine, the conversion of phenylalanine 65 to leucine and 
the conversion of alanine 164 to valine. The specific 
activity of SB50 was about 63 times that of BFP (Tyr 67 -»His ) . 
See Table II. 



TABLE II 



Mutant 


Excitation 
Maximum 
(nm) 


Emission 
Maximum 
(nm) 


Factor of 
increased 
green 

fluorescence 
(at maximum 
emission) as 
compared to 
wtGFP 


Factor of 
increased blue 
fluorescence 
(at maximum 
emission ) as 
compared to 
BFP (Tyr 67 -*His) 


SG12 


398 


509 


9-12X 




SG11 


471 


508 


19-38X 




SG25 


473 


509 


50- 100X 




SB42 


387 


450 




27X 


SB4 9 


387 


450 




37X 


SB50 


387 


450 




63X 



The dramatic increase in fluorescent activity 
resulting from the amino acid substitutions of the present 
invention was wholly unexpected. The cellular fluorescence of 
the mutants was at least five times greater, and usually over 
twenty times greater, than that of the parent wtGFP or 
BFP (Tyr 67 -His; . Note that the maximum emission wavelengths 
vary among the mutants, and that the above - reported fold 
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increases refer only to minimal increases in relative cellular 
fluorescence at the maximum emission wavelength of the mutant. 
Given a particular wavelength, the values may be- substantially 
larger, i.e., the mutants may have a 200-fold greater cellular 
5 fluorescence than the reference wtGTP or BFP C Tyr 67 -»His ) This 

is important because devices for measuring fluorescence often 
have set wavelengths, or the limitations of a given experiment 
often require the use of a set wavelength. Thus, for example, 
the emission and detection parameters of a fluorescence 
10 microscope or a f luorescence - act iva t ed cell sorter may be set 
for a wavelength wherein the cellular fluorescence of a given 
mutant is 200- fold greater than that of the known GFPs and 
BFPs . 

The GFP and BFP mutants of this invention, in 
15 contrast to the wild type protein or other reported mutants, 
allow detection of green fluorescence in living mammalian 
cells when present in few copies stably integrated into the 
genome. This high cellular fluorescence of the mutant GFPs 
and BFPs is useful for rapid and simple detection of gene 
20 expression in living cells and tissues and for repeated 
analysis of gene expression over time under a variety of 
conditions. They are also useful for the construction of 
stable marked cell lines that can be quickly identified by 
fluorescence microscopy or fluorescence activated ceil 
25 sorting . 

Example 8 

We have established f luoroplate - based assays for the 
quantitation of gene expression after trans fee t ions . In a 

30 number of embodiments, a nucleic acid encoding a mutant GFP cr 

BFP of this invention is inserted into a vector and introduced 
into and expressed in a cell. Typically, expression of GFP 
mutants can be detected as quickly as 5 hours pos t - infect ion 
or less. Expression is followed over time in living cells by 

35 a simple measurement in multi-well plates. In this way, many 

transf ect ions can be processed in parallel. 
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Example 9 

The vectors and nucleic acids provided herein are 
used no generate chimeric proteins wherein a nucleic acid 
sequence that encodes a selected gene product is fused to the 
C- or N- terminus of the mutant GFPs and/or BFPs of this 
invention. A number of unique viral, plasmid and hybrid gene 
constructs have been generated that incorporate the new mutant 
GFP and/or mutant BFP sequences indicated above. These 
include : 

• HIV viral sequences Cm the nef gene) containing SG11 or 
SG25 



Neomycin & hygromycm plasmids containing SG11 or SG25 
Moloney Leukemia Virus vector (retrovirus) also 
exprensinn SG25 

• Hybrid gene constructs expressing HIV viral proteins 
(rev. td-rev, tat, nef, gag, env, and vpr) and either 
SG11 or SZ2 L or SB50 . 

• Hybrid gene construct containing vectors that incorporate 
the cytoplasmic proteins ran, B23, nucleolin, poly-A 
binding protein and either SGll or SG25 or SB50. 

These hybrids of the mutant nucleic acids provided 
herein are used to study protein trafficking in living 
mammalian ceils. Like the wild type GFP , the mutant GFP 
proteins are normally distributed throughout the cell except 
for the nucleolus. Fusions to other proteins redistribute the 
fluorescence, depending on the partner in the hybrid. For 
example, fusion with the entire HIV-l Rev protein results in a 
hybrid molecule which retains the Rev function and is 
localized m the nucleolus where Rev is preferentially found. 
Fusion to the N-terminal domain of the HIV-l Nef protein 
created a chimeric protein detected in the plasma membrane, 
the site of Nef localization. 

Example 10: pCMVgfoll 

pCMVgfoll is a pFREDll derivative containing the 
bacterial neomycin phosphotransferase gene (neo) (Southern and 
Berg (1982) .7. Mol . Appl . Genetics 1:327) fused at the 
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C-teririnus of SG11 . A four amino acid f Gly- Ala -Gly- Ala ) (SEQ 
ID NO:26) linker region connects the last amino acid of SGil 
to -the second amino acid of nec , thus generating The hybrid 
SGll-neo protein (gfoll, SEQ ID NO:25). Gfoll is expressed 
5 from the CMV promoter and contains the intact SG11 polypeptide 
and all of neo except for the first Met. 

pCMVgfoll was constructed in several steps. First, 
pFREDHDNae was constructed by Nael digestion of pFREDll and 
self - ligation of the 4613bp fragment. The Nael deletion 
10 removes the SV4 0 promoter and neo gene from pFREDll,thus 

creating pFREDHDNae. Next, in order to fuse the neo coding 
region downstream to SG11, the neo gene was PCR amplified from 
pcDNA3 using primers Bio51 

( 5 * - CGCGGATCCTTCGAACAAGATGGATTGCACGC - 3 1 ) (SEQ ID NO : 2 7 ) and 
15 Bio52 ( 5 - C CGG AATT CTC AG AAGAACTCGTCAAGAAGGCG A- 3 * ) (SEQ ID 

NO: 28) . Primer BioBl introduces a BamHI site followed by a 
BstBI recognition sequence at the 5' end of neo, while primer 
Bio52 introduces an EcoRI site 3 1 to the neo gene. The PCR 
product was digested with BamHI and EcoRI and cloned into the 
20 4562 bp vector resulting from the BamH I - EcoRI digestion of 

pFREDHDNae , thus generating pFREDl lDNaeBs tNec . Subsequently, 
SG11 was PCR amplified from pFREDl IDNae using primers Bio49 
{ 5 ' - GG CG CG CAAGAAATGGCTAGCAAAGG AG AAGAACTCTTCACTGG AG - 3 ' ) ( SEQ I D 
NO:29) and Bio50 

2 5 ( 5 1 - CCCATCGATAGCACCAGCACCGTTGTACAGTTCATCCATGCCATGT - 3 ' ) (SEQ ID 

NO: 30) to remove the sgll stop codon in pFREDHDNaeBs tNeo and 
to introduce the four amino acid ( Gly -Ala -Gly- Ala } linker 
followed by a Clal site. The PCR product was digested with 
Nhel and Clal and cloned into the 4763 bp NhelBstBi fragment 

30 from pFREDl IDNaeBstNeo , thus generating pCMVgfoll. 

Following transfection of 293 cells (Graham et al. 
(1977) , cJ. Gen, Virol. 5:59) as well as other human and mouse 
cell lines with pCMVgfoll, bright fluorescent t rans f ec t ant s 
were apparent under the flourescent microscope and colonies 

35 resistant to G418 could be obtained two weeks later. 

It should be noted that pCMVgfoll was the best 
protein fusion in terms of fluorescent emission intensity and 
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number of G4 1 8 resistant colonies compared to several SGll-ne 
or neo-SGll fusions generated and examined. 

Example 11: pPGKqfo25 

pPGKqfo25 is a pCMVgfoII derivative containing SG25 
instead cf SG11 within gfo (SEQ ID NO: 31) . Expression of 
gfo25 in pPGKgfo25 is under the control of the mouse 
phosphoglycerate kmase-1 (PGR) promoter. 

pPGKgfo25 was constructed in several steps. First, 
a SacII site was introduced downstream of the PGK promoter m 
pPGKneobpA (Soriano et aJ . (1991) Cell: 64-393) by: 

i) annealing oligonucleotides #18990 (SEQ ID NO:32) 
(5 1 - GACCGGGACACGTATGCAGCCTCCGC- 3 ' ) and 18991 (SEQ ID 
NO:j3) (5 ■ -GGAGGCTGGATACGTGTCCCGGTCTGCA-3 ' ) to create a 
douole stranded adapter for PstI at the 5' end and SacII 
at the 3 ' end . 

ii) ligatmg this adapter to the 3423bp fragment from the 
Pstl-SacII double digestion of pPGKneobpA, thus 
generating pPGKPtAf Sc. 

Next, the GMV promoter of pFRED2S was replaced with the PGK 
promoter by cloning the 565bp Sail (filled with Klenow) - Sac 1 1 
region from pPGKPtAfSc to the 5288bp Bglll (filled with 
Klenow) -SacII fragment from pFRED2 5 , resulting in pFRED2 5 PGK . 
In the final step, pPGKgfo25 was constructed by ligatmg the 
813bp BglH-Ndel fragment from pFRED2 5 PGK containing the PGK 
promoter and SG25, to the 4185bp Bglll-Ndel fragment of 
pCMVgf oil . 



Example 12: pGen - PGKqf o2 5RO (SEP ID NO : 34) 

pGen-PGKgf c25RO is a pGen- (Soriano et al . (1991), J, 
Virol. 65:2314) derivative containing the gfo25 hybrid under 
the control of PGK promoter. It was constructed by subcloning 
the 2810bp Sail fragment of pPGKgfo25 into the Xhol site of 
pGen. In viruses generated from pGen - FGKgf o2 5RO (see below) 
transcription originated from the PGK promoter is in reverse 
orientation (RO) to that initiated from the viral long 
terminal repeats < LTR ) 
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To generate ecotropic or pseudotyped viruses, 
pGen- PGKgf o25RO was co-cransf ecced into 293 cells together 
with pHIT60 and pHIT123 DNAs (production of ecotropic virus) 
or with pHIT60 and pHCMV-G DNAs (production of pseudotyped 
5 virus) . pHITGO and pHIT123 contain the gag-pol and env coding 
regions from the Moloney murine leukemia virus (Mo-MLV) 
respectively, under the control of the CMV promoter (Soneoka 
et al . (1995), Nuc . Acid Res. 23:628. pHCMV-G contains the 
coding region of the G protein from the vesicular stomatitis 
10 virus (VSV) expressed from the CMV promoter (Yee et ai . 

(1994), Prcc. Wat* J Acad. Sci . USA 91:9564. Virus - containing 
supernatants were harvested 48 hours post trans feet ion , 
filtered and stored at -80°C. 

15 Example 13: pNLnSGll (SEP ID NO: 35) 

The SG11 sequence from plasmid pFREDll was PCR- 
amplified with primers #17982 (SEQ ID NO:36) 

(5 ' -GGGGCGTACGGAGCGCTCCGAATTCGGTACCGTTTAAACGGGCCCTCTCGAGTCC 
GTTGTACAGTTCATCCATG- 3 f ) and #17983 (SEQ ID NO : 3 7 ) 

2 0 ( 5 * - GGGGG AATTCG CG CG CGT A CGTAAGCGCTAGCTG AG CAAGAAATGGCT AG C AAA 

GGAGAAGAACTC - 3 ' ) . The PCR product was digested with BlpI and 
Xhol and cloned into the large Blpl-Xhol fragment from pNL.4 - 3 
(Adachi et al . (1986), J . Virol. 59: 284. In pNLnSGll the 
full SG11 polypeptide containing an additional four 

25 linker - encoded amino acids at the C- terminus, is expressed as 

a hybrid protein with the 24 N-terminal amino acids of the 
HIV- 1 protein Nef . 

We constructed transmissible HIV- 1 stocks with our 
mutants, which generate green fluorescence upon transfection 

30 of human cells. These transmissible HIV-1 stocks are used to 
detect the kinetics of infection under a variety of 
conditions. In particular, they are used to study the effects 
of drugs on the kinetics of infection. The level of 
fluorescence, and the subcellular compartment al i zat ion of that 

35 fluorescence, is easily visualized and quantified using well 

known methods. This system is easy to visualize, and 
dramatically cuts the costs of many experiments that are 
presently tedious and expensive. 
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To produce infectious virus, pNLnSGll was 
transfected m 293 cells. 24 hours later, Jurkat cells were 
added to the transfectants. At various times post-infection, 
the medium was removed, filtered, and used to infect fresh 
Jurkat or other HIV- 1 - permissive cells. Two days later the 
infected ceils were green under fluorescent microscope. 
Visible syncytia were also green. Viral stocks were generated 
and kept at -80° C. 

When the nucleic acids, vectors, mutant proteins 
provided herein are combined with the knowledge of those 
skilled in the art of genetic engineering and the guidance 
provided herein, it will be apparent to one of ordinary skill 
m the art that many changes and modifications can be made 
thereto without departing from the spirit or scope of the 
invention as set forth herein. These changes and 
modifications are encompassed by the present invention. 
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SEQUENCE LISTING 



\1; GENERAL. INFORMATION : 
S - -■ - 

(i) APPLICANT: Pavlakis, George N. 

Gaitanaris, George A. 
Stauber, Roland H. 
Vournakis, John N. 

10 

in) TITLE OF INVENTION : Mutant Aequorea victoria Fluorescent 
Proteins Having Increased Cellular Fluorescence 

mi; NUMBER OF SEQUENCES: 3 7 

15 

'.iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Townsend and Townsend and Crew LLP 
( B } STREET : Two Embarcadero Center , 8th Floor 

(C) CITY : San Francisco 
20 (D) STATE: California 

<E) COUNTRY: USA 
( F) ZIP: 94111-3834 

(v) COMPUTER READABLE FORM: 

2 5 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS - DOS 

( D } SOFTWARE: Patentln Release #1.0, Version #1.30 

3 0 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER : US Not yet assigned 

(B) FILING DATE: Not yet assigned 

( C) CLASSIFICATION : 

3 5 (vi ii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Weber, Kenneth A . 

(B) REGISTRATION NUMBER: 31,677 

(C) REFERENCE/ DOCKET NUMBER: 015 2 8 0-249000 

4 0 ( ix) TELECOMMUNICATION INFORMATION : 

(A) TELEPHONE : (415) 576-0200 

( B) TELEFAX : (415) 576-0300 



4 5 (2) INFORMATION FOR SEQ ID NO : 1 . 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 0 base parr: 

(B) TYPE: nucleic acid 
50 (CJ STRANDEDNESS : single 

(D) TOPOLOGY : linear 



55 



(li) MOLECULE TYPE : cDNA 



Ux) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION : 1..720 

(D) OTHER INFORMATION: /product = "wild type Aequorea victoria 
60 Green Fluorescent Protein (wtGF) " 
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30 



4 0 



50 
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60 



61 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 1 



ATG GCT AGC AAA GGA GAA GAA CTC TTC ACT GGA GTT GTC CCA ATT CTT 
Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu 



-15 



AAC TAT AAC TCA CAC AAT GTA TAC ATC ATG GCA GAC AAA CAA AAG AAT 
Asn Tyr Asn Ser Hxs Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 150 i 5 5 160 



4 8 



96 



L44 



GTT GAA TTA GAT GGT GAT GTT AAT GGG CAC AAA TTT TCT GTC AGT GGA 
Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
2 0 25 30 

GAG GGT GAA GGT GAT GCA ACA TAC GGA AAA CTT ACC CTT AAA TTT ATT 
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 

TGC ACT ACT GGA AAA CTA CCT GTT CCA TGG CCA ACA CTT GTC ACT ACT 
Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 

TTC TCT TAT GGT GTT CAA TGC TTT TCA AGA TAC CCG GAT CAT ATG AAA 
Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 7 ° 75 so 

CGG CAT GAC TTT TTC AAG AGT GCC ATG CCC GAA GGT TAT GTA CAG GAA 
Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
85 90 95 

AGA ACT ATA TTT TTC AAA GAT GAC GGG AAC TAC AAG ACA CGT GCT GAA 3 36 

Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
100 105 no 

GTC AAG TTT GAA GGT GAT ACC CTT GTT AAT AGA ATC GAG TTA AAA GGT 3 84 

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
115 120 125 

3 5 ATT GAT TTT AAA GAA GAT GGA AAC ATT CTT GGA CAC AAA TTG GAA TAC 432 

He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
130 135 140 y 



1 92 



240 



28B 



480 



GGA AT. AAA GTT AAC TTC AAA ATT AGA CAC AAC ATT GAA GAT GGA AGC 52 8 

J( - Gly Ile U Y S Val Asn phe -ys He Arg His Asn He Glu Asp Gly Ser 

4:r? 1S5 170 175 

GTT CAA CTA GCA GAC CAT TAT CAA CAA AAT ACT CCA ATT GGC GAT GGC 5 76 

Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro Ile Gly Asp Gly 
180 iB5 19 J 

CCT GTC CTT TTA CCA GAC AAC CAT TAC CTG TCC ACA CAA TCT GCC CTT 624 
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 

TCG AAA GAT CCC AAC GAA AAG AGA GAC CAC ATG GTC CTT CTT GAG TTT 6 72 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

u T t ^ A GCT GGG ATT ACA CAT GGC ATG GAT GAA CTA TAC AAA TAA 72 0 

Val Thr Ala Ala Gly Ile Thr His Gly Met Asp Glu Leu Tyr Lys * 
225 230 235 240 
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(2) INFORMATION FOR SEC ID NO : 2 : 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 239 amino acids 
5 (B) TYPE: ammo acid 

(D) TOPOLOGY: linear 

<ii> MOLECULE TYPE: protein 

10 (xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu 
15 10 15 

15 Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 

20 25 30 



20 



35 



50 



Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 

35 40 45 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 

50 55 60 



Phe Ser Tyr Gly Val Gin Cys Phe Ser Arq Tyr Pro Asp His Met Lys 

25 65 70 75 80 

Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 

85 90 95 

3 0 Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 

100 ' 105 110 



Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 

115 120 125 

lie Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 

130 135 140 



Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 

4 0 145 150 155 160 

Gly He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 

165 ' 170 175 

45 Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 

180 185 190 



Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 

195 200 205 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 

210 215 220 



Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
55 225 230 235 

(2) INFORMATION FOR SEQ ID NO : 3 : 

6 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

6 5 

(n) MOLECULE TYPE: DNA 
(IX) FEATURE: 
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(A) NAME/KEY: - 

(B) LOCATION : 1 . . 35 

(D) OTHER INFORMATION: /note, "oligonucleotide sense 

#16417 " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO ; 3 : 
GGAGGCGCGC AAGAAATGGC TAGCAAAGGA GAAGA 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

fli) MOLECULE TYPE: DNA 



(IX) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION : 1 . . 36 

(D) OTHER INFORMATION: /note= "oligonucleotide antisense pri 

#16418" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
GCGGGATCCT TATTTGTATA GTTCATCCAT GCCATG 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 623B base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(n) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY : 

(B) LOCATION: 1. .6238 

(D) OTHER INFORMATION; /note- "pFRED7 *' 



(Xi ) SEQUENCE DESCRIPTION: S 


EQ ID NO: 5 : 








GACGGATCGG 


GAGATCTCCC 


GATCCCCTAT 


GGTCGACTCT 


CAGTACAATC 


TGCTCTGATG 


60 


CCGCATAGTT 


AAGCCAGTAT 


CTGCTCCCTG 


CTTGTGTGTT 


GGAGGTCGCT 


GAGTAGTGCG 


120 


CGAGCAAAAT 


TTAAGCTACA 


ACAAGG CAAG 


GCTTGACCGA 


CAATTGCATG 


AAGAATCTGC 


180 


TTAGGGTTAG 


GCGTTTTGCG 


CTGCTTCGCC 


TCGAGGCCTG 


G CCATTG CAT 


ACGTTGTATC 


240 


CATATCATAA 


TATGTACATT 


TATATTGGCT 


CATGTCCAAC 


ATTACCGCCA 


TGTTGACATT 


300 


GATTATTGAC 


TAGTTATTAA 


TAGTAATCAA 


TTACGGGGTC 


ATTAGTTCAT 


AGCCCATATA 


360 


TGGAGTTCCG 


CGTTACATAA 


CTTACGGTAA 


ATGGCCCGCC 


TGGCTGACCG 


CCCAACGACC 


420 


CCCGCCCATT 


GACGTCAATA 


ATGACGTATG 


TTCCC ATAGT 


AACGCCAATA 


GGGACTTTCC 


4 80 
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ATTGACGTCA ATGGGTGGAG TATTTACGGT 
ATCATATGCC AAGTACG CCC CCTATTGACG 
5 ATGCCCAGTA CATGACCTTA TGGGACTTTC 

TCGCTATTAC CATGGTGATG CGGTTTTGGC 
ACTCACGGGG ATTTCCAAGT CTCCACCCCA 

10 

AAAATCAACG GGACTTTCCA AAATGTCGTA 
GTAGGCGTGT ACGGTGGGAG GTCTATATAA 
15 CCTGGAGACG CCATCCACGC TGTTTTGACC 

TCCGCGGGCG CGCAAGAAAT GGCTAGCAAA 
ATTCTTGTTG AATTAGATGG TGATGTTAAT 

20 

GAAGGTGATG CAACATACGG AAAACTTACC 
CCTGTTCCAT GGCCAACACT TGTCACTACT 

2 5 TACCCGGATC ATATGAAACG GCATGACTTT 

CAGGAAAGAA CTATATTTTT CAAAGATGAC 
TTTGAAGGTG ATACCCTTGT TAATAGAATC 

30 

GG AAA C ATT C TTGGACACAA ATTGGAATAC 
GCAGACAAAC AAAAGAATGG AATCAAAGTT 

3 5 GGAAGCGTTC AACTAGCAGA CCATTATCAA 

CTTTTACCAG ACAAC C ATT A CCTGTCCACA 
AAG AG AG AC C ACATGGTCCT TCTTGAGTTT 

40 

GATGAACTAT ACAAATAAGG ATCCACTAGT 
ATATCCATCA CACTGGCGGC CGCTCGAGCA 

4 5 ACCTAAATGC TAGAGCTCGC TGATCAGCCT 

TTGTTTGCCC CTCCCCCGTG CCTTCCTTGA 
CCTAATAAAA TGAGGAAATT GCATCGCATT 

50 

GTGGGGTGGG GCAGGACAGC AAGGGGGAGG 
ATGCGGTGGG CTCTATGGCT TCTG AGGCGG 

5 5 CCCACGCGCC CTGTAGCGGC GCATTAAGCG 

CCGCTACACT TGCCAGCGCC CTAGCGCCCG 
CCACGTTCGC CGGCTTTCCC CGTCAAGCTC 

60 

TTAGTGCTTT ACGGCACCTC GACCCCAAAA 
GGCCAT^GCC CTGATAGACG GTTTTTCGCC 

6 5 GTGGACTCTT GTTCCAAACT GGAACAACAC 

TATAAGGGAT TTTGGGGATT TCGGCCTATT 
TTAACGCGAA TTAATTCTGT GGAATGTGTG 



64 

AAACTGCCCA CTTGGCAGTA CATCAAGTGT 54 0 

TCAATGACGG TAAATGG CCC GCCTGGCATT 6 00 

CTACTTGG C A GTACATCTAC GTATTAGTCA 66 0 

AGTACATCAA TGGGCGTGGA TAGCGGTTTG 72 0 

TTGACGTCAA TGGGAGTTTG TTTTGGCACC 780 

ACAACTCCGC CCCATTGACG CAAATGGGCG 84 0 

GCAGAGCTCG TTTAGTGAAC CGTCAGATCG 900 

TCCATAGAAG ACACCGGGAC CGATCCAGCC 96 0 

GGAGAAGAAC TCTTCACTGG AGTTGTCCCA 102 0 

GGGCACAAAT TTTCTGTCAG TGGAGAGGGT 108 0 

CTTAAATTTA TTTGCACTAC TGGAAAACTA 114 0 

TTCTCTTATG GTGTTCAATG C TTTTC AAG A 12 00 

TTCAAGAGTG CCATGCCCGA AGGTTATGTA 12 6 0 

GGGAACTACA AGACACGTGC TGAAGTCAAG 132 0 

GAGTTAAAAG GTATTGATTT TAAAGAAGAT 13 8 0 

AACTATAACT CACACAATGT ATACATCATG 1440 

AACTTCAAAA TTAGACACAA CATTGAAGAT 1500 

CAAAATACTC CAATTGGCGA TGGCCCTGTC 156 0 

CAATCTGCCC TTTCGAAAGA TCCCAACGAA 162 0 

GTAACAGCTG CTGGGATTAC ACATGGCATG 16 8 0 

AACGGCCGCC AGTGTG CTGG AATTCTGCAG 174 0 

TGCATCTAGA GGGCCCTATT CTATAGTGTC 18 00 

CGACTGTGCC TTCTAGTTGC CAGCCATCTG 186 0 

CCCTGGAAGG TGCCACTCCC ACTGTCCTTT 192 0 

GTCTGAGTAG GTGTCATTCT ATTCTGGGGG 198 0 

ATTGGGAAGA CAATAGCAGG CATGCTGGGG 2 04 0 

AAAGAACCAG CTGGGGCTCT AGGGGGTATC 2100 

CGGCGGGTGT GGTGGTTACG CGCAGCGTGA 2 16 0 

CTCCTTTCGC TTTCTTCCCT TCCTTTCTCG 222 0 

TAAATCGGGG CATCCCTTTA GGGTTCCGAT 22 BO 

AACTTGATTA GGGTGATGGT TCACGTAGTG 2 34 0 

CTTTGACGTT GGAGTCCACG TTCTTTAATA 2 4 00 

TCAACCCTAT CTCGGTCTAT TCTTTTGATT 2460 

GGTTAAAAAA TGAGCTGATT TAACAAAAAT 2 52C 

TCAGTTAGGG TGTGGAAAGT CCCCAGGCTC 2 58 0 
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CCCAGGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA GTCAGCAACC AGGTGTGGAA 264 0 

AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA 2 700 

CCATAGTCCC GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2 76 0 

CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATG CAGA GGCCGAGGCC GCCTCTGCCT 2 82 0 

CTG AG CT ATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG CCTAGGCTTT TGCAAAAAGC 288 0 

TCCCGGGAGC TTGTATATCC ATTTTCGGAT CTGATCAAGA GACAGGATGA GGATCGTTTC 2 94 0 

GCATGATTGA ACAAGATGGA TTGCACGCAG GTTCTCCGGC CGCTTGGGTG GAGAGGCTAT 3 00 0 

TCGGCTATGA CTGGG C AC AA CAGACAATCG GCTGCTCTGA TGCCGCCGTG TTCCGGCTGT 
CAGCGCAGGG GCGCCCGGTT CTTTTTGTCA AGACCG AC CT GTCCGGTGCC CTGAATGAAC 
TGCAGGACGA GGCAGCGCGG CTATCGTGG C TGGCCACGAC GGGCGTTCCT TGCGCAGCTG 318 0 

TGCTCGACGT TGTCACTGAA GCGGGAAGGG ACTGGCTGCT ATTGGGCGAA GTGCCGGGGC 3 24C 

AGGATCTCCT GTCATCTCAC CTTGCTCCTG CCGAGAAAGT ATCCATCATG GCTGATGCAA 3 3 00 

TGCGGCGGCT GCATACGCTT GATCCGGCTA CCTGCCCATT CGACCACCAA GCG AAACATC 3 36 0 

GCATCGAGCG AGCACGTACT CGG ATGGAAG CCGGTCTTGT CGATCAGGAT GATCTGGACG 3 420 

AAG AG CATC A GGGGCTCGCG CCAGCCGAAC TGTTCGCCAG GCTCAAGGCG CGCATGCCCG 3480 
ACGGCGAGGA TCTCGTCGTG ACCCATGGCG ATGCCTGCTT GCCGAATATC ATGGTGGAAA 3 54 0 

ATGGCCGCTT TTCTGGATTC ATCGACTGTG GCCGGCTGGG TGTGGCGGAC CGCTATCAGG 36 00 

A CAT AG CGTT GGCTACCCGT GATATTGCTG AAGAGCTTGG CGGCGAATGG GCTGACCGCT 3 66 0 

TCCTCGTGCT TTACGGTATC GCCGCTCCCG ATTCGCAGCG CATCGCCTTC TATCGCCTTC 3 720 

TTGACGAGTT CTTCTG AG CG GGACTCTGGG GTTCGAAATG ACCGACCAAG CGACGCCCAA 3 780 

CCTGCCATCA CGAGATTTCG ATTCCACCGC CGCCTTCTAT GAAAGGTTGG GCTTCGGAAT 3 84 0 

CGTTTTCCGG GACGCCGGCT GGATGATCCT CCAGCGCGGG GATCTCATGC TGGAGTTCTT 3 900 

CGCCCACCCC AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC 3 96 0 

AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT 4 02 0 

CAATGTATCT TATCATGTCT GTATACCGTC GACCTCTAGC TAGAGCTTGG CGTAATCATG 4O8 0 

GTCATAGCTG TTTCCTGTGT GAAATTGTTA TCCGCTCACA ATTCCACACA ACATACGAGC 4 14 0 

CGG AAG CATA AAGTGTAAAG CCTGGGGTGC CTAATGAGTG AGCTAACTCA CATTAATTGC 4 2 00 

GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG AAACCTGTCG TGCCAGCTGC ATTAATGAAT 4 26 0 

CGGCCAACGC GCGGGGAGAG GCGGTTTGCG TATTGGGCGC TCTTCCGCTT CCTCGCTCAC 4 3 20 

TGACTCGCTG CGCTCGGTCG TTCGGCTGCG GCGAGCGGTA TCAGCTCACT CAAAGGCGGT 4 3 80 

AATACGGTTA TCCACAGAAT CAGGGGATAA CGCAGGAAAG AA CATGTG AG CAAAAGGCCA 4440 

GCAAAAGGCC AGGAACCGTA AAAAGGCCGC GTTGCTGGCG TTTTTCCATA GGCTCCGCCC 4 500 

CCCTGACGAG CATCACAAAA ATCGACGCTC AAGTCAGAGG TGGCGAAACC CGACAGGACT 4 560 

ATAAAGATAC CAGGCGTTTC CCCCTGGAAG CTCCCTCGTG CGCTCTCCTG TTCCGACCCT 46 2 0 

GCCGCTTACC GGATACCTGT CCGCCTTTCT CCCTTCGGGA AGCGTGGCGC TTTCTCAATG 4 680 
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CTCACGCTGT 


AGGTATCTCA GTTCGGTGTA GGTCGTTCGC TCCAAGCTGC 


I GCTGTGTGCA 


4 74 0 




CGAACCCCCC 


GTTCAGCCCG 


ACCGCTGCGC 




AAu lAJLGIC 


' TTGAGTCCAA 


4800 


5 


CCCGGTAAGA 


CACGACTTAT 


CGCCACTGGC 


«vaV— ^„>-^t_ X 


oCj 1 AALAGGfl 


TTAGCAGAGC 


4860 




GAGGTATGTA 


GGCGGTGCTA 




P A A nTfTTr ^ 


L CTAACTACG 


GCTACACTAG 


4920 


10 


AAGGACAGTA 


TTTGGTATCT 


GCGCTCTG^T 


PA appp a ptt 


AC C TTCGG AA 


AAAGAGTTGG 


4 980 




TAGCTCTTGA 


TCCGGCAAAC 


AAACCACCGC 


■I O l^T i. L_ VJ \JJ 1 




TTTGCAAGCA 


5040 




GCAGATTACG 


CGCAGAAAAA 


AAGGATCTCA 


AG A AG ATP r "" 


I I Li A 1 C I \L I I 


CTACGGGGTC 


5100 


15 


TGACGCTCAG 


TGGAACGAAA 


ACTCACGTTA 


A PPP A TTTTr 


C_> I C ATGAGAT 


TAT C AAAAAG 


5160 




GATCTTCAPP 


TAGATCCTTT 


TAAATTAA A A 




AAATCAATCT 


AAAGTATATA 


5220 


20 


TGAGTAAACT 


TGGTCTGACA 


GTTAPPAATP 


^ i .MA. 1 bAb I 


GAGGCA^- CTA 


TCTCAGCGAT 


5280 




CTGTCTATTT 


CGTTCATCCA 


TAGTTGPPTG 




G TGTAGATAA 


CTACGATACG 


5340 




GGAGGPPT"TA 


CCATCTGGCC 


p p A G tp rTrr 


i OA 1 AC C_ G 


CGAGACCCAC 


GCTCACCGGC 


5400 


2 5 


TcrAnaTTTa 

i v»Lrtun 1 1 1 >\ 


TCAGCAATAA 


A p p a n p p a p p 


U bbjAAbbb L C 


GAGCGCAGAA 


GTGGTCCTGC 


5460 




71 7\ PIT**!* A *T*/-* r* 


w V_ ^_ 1 ^- f\ i 


1 L 1 Ai i AA 


TTGTTGCCGG 


GAAG CTAGAG 


TAAGTAGTTC 


5520 


30 


ppp aptt a n n - 




I\pr T"TV* T"T*/^ r* 


C ATTG CTACA 


GGCATCGTGG 


TGTCACGCTC 


5580 




^-J 1 O 111 1 




1 \_AOC I LtOCj 


TTCCCAACGA 


TCAAGGCGAG 


TTAC ATG AT C 


5640 








L-OLj x i i\\jL. 1 C 


CTTCGGTCCT 


CCGATCGTTG 


TCAGAAGTAA 


5700 


3 5 


wJ. IVjCtCCCrCA 


p'FPT'r a TP* A r~" 


1 C A 1 LjC I IAi 


GGCAGCACTG 


CATAATTCTC 


TTACTGTCAT 


5760 






AnAT^rTTTT 
rto/ilUu 1 i 1 A 


^ C 1 (jAC i. Gb 


TGAGTACTCA 


ACCAAGTCAT 


TCTGAGAATA 


5820 


40 


blbi ATGCGG 


CGACCGAGTT 


GCTCTTGCCC 


GGCGTCAATA 


CGGGATAATA 


CCGCGCCACA 


5880 


TAGCAGAACT 


TTAAAAGTGC 


TCATCATTGG 


AAAACGTTCT 


TCGGGGCGAA 


AAC^PTPAAP 


5 94 0 




GATCTTACCG 


CTGTTGAGAT 


CCAGTTCGAT 


GTAACCCACT 


CGTGCACCCA 


APTPATPTTP 


n n n 
o u u u 


45 


AGCATCTTTT 


ACTTTCACCA 


GCGTTTCTGG 


GTGAGCAAAA 


ACAGGAAGGC 


AAAATGCCGC 


6060 




AAAAAAGGGA 


ATAAGGGCGA 


CACGGAAATG 


TTGAATACTC 


AT ACTCTTC C 


TTTTTCAATA 


6120 


50 


TTATTGAAGC 


ATTTATCAGG 


GTTATTGTCT 


CATGAGCGGA 


T A C ATATTTG 


AATGTATTTA 


6180 


GAAAAATAAA 


CAAATAGGGG 


TTCCGCGCAC 


ATTTCCCCGA 


AAAGTGCCAC 


CTGACGTC 


6236 




(2) INFORMATION FOR SE 


Q ID NO: 6 : 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH; 3699 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
60 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



6 5 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION : 1..3 6 99 

(D) OTHER INFORMATION: /note= "pBSGFP" 



BNSDOCID <WO 9742320A1 i > 



WO 97/42320 



PCT/US97/0762S 



15 



25 



35 



45 



55 
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(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
GGAAATT3TA AACGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 
ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCGA 
GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 
CAACGTCAAA G3GCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 
CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 
CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 
AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 
CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCG CGCCATTCGC CATTCAGGCT 
GCGCAACTGT TGGGAAGGGC GATCGGTGCG GGCCTCTTCG CTATTACGCC AGCTGGCGAA 
AGGGGGATCT CCTGCAAGGC GATTAAGTTG GGTAACGCCA GGGTTTTCCC AGTCACGACG 
TTGTAAAACC ACGCCCAGTG AATTGTAATA CGACTCACTA TAGGG CGAAT TGGGTACCGG 
"CCCCCCCZC GAGGTCGACC GTATCGATAA GCTTGATGAT CCTTATTTGT ATAGTTCATC 
CATGCCATCT GTAATCrCAC CAGCTGTTAC AAACTCAAGA AGGACCATGT GGTCTCTCTT 
TTCGTTGGGA TCTTTCCAAA GGGCAGATTG TGTGGACAGG TAATGGTTGT CTGGTAAAAG 



GACAGGGCCA TCCCCAATTC CAGTATTTTG TTGATAATGG TCTGCTAGTT GAACGCTTCC 
ATCTTCAATC TTGTGTCTAA TTTTGAAGTT AACTTTGATT CCATTCTTTT GTTTGTCTGC 
CATGATGTAT ACATTGTGTG AGTTATAGTT GTATTCCAAT TTGTGTCCAA GAATGTTTCC 
AT CTTCTTTA AAATCAATAC CTTTTAACTC GATTCTATTA ACAAGGGTAT CACCTTCAAA 
CTTGACTTCA CCACGTGTCT TGTAGTTCCC GTCATCTTTG AAAAATATAG TTCTTTCCTG 
TACATAACCT TCGGGCATGG CACTCTTGAA AAAGTCATGC CGTTTCATAT GATCCGGGTA 
TCTTGAAAAG CATTCAACAC CATAAGAGAA AGTAGTGACA AGTGTTGGCC ATGGAACAGG 
TAGTTTTCCA GTAGTGCAAA TAAATTTAAG GGTAAG TTTT CCGTATGTTG CATCACCTTC 
ACCCTCTCCA CTGACAGAAA ATTTGTGCCC ATTAACATCA CCATCTAATT CAACAAGAAT 
TGGGACAACT CCAGTGAAGA GTTCTTCTCC TTTGCTAGCC ATTTCTTGCG CGATCGAATT 
CCTGCAGCCC GGGGGATCCA CTAGTTCTAG AGCGGCCGCC ACCG CGGTGG AG CTCCAG CT 
TTTGTTCCCT TTAGTGAGGG TTAATTCCGA GCTTGGCGTA ATCATGGTCA TAGCTGTTTC 
CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT ACGAGCCGGA AGCATAAAGT 
GTAAAGCCTG GGGTGCCTAA TGAGTGAGCT AACTCACATT AATTG CGTTG CGCTCACTGC 
CCGCTTTCCA GTCGGGAAAC CTGTCGTGCC AGCTG CATTA ATGAATCGGC CAACGCGCGG 
GGAGAGGCGG TTTGCGTATT GGGCGCTCTT CCGCTTCCTC GCTCACTGAC TCGCTGCGCT 
CGGTCGTTCG GCTGCGGCGA GCGGTATCAG CTCACTCAAA GGCGGTAATA CGGTTATCCA 
CAGAATCAGG GGATAACGCA GGAAAGAACA TGTG AG C AAA AGGCCAGCAA AAGG CCAGG A 
ACCGTAAAAA GGCCGCGTT3 CTGGCGTTTT TCCATAGGCT CCGCCCCCCT GACGAGCATC 
ACAAAAATCG ACGCTCAAGT CAGAGGTGGC GAAACCCGAC AGGACTATAA AGATACCAGG 



660 
720 
700 



bnscocid <wn 



WO 97/42320 



PCT/US97/07625 
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2820 
2880 
2940 



20 



30 



CGTTTCCCCC TGGAAGCTCC CTCGTGCGCT CTCCTGTTCC GACCCTGCCG CTTACCGGAT 21 Oi 

ACCTGTCCGC CTTTCTCCCT TCGGGAAGCC TGGCGCTTTC TCATAGCTCA CGCTGTAGGT 21 6 < 

5 ATCTCAGTTC GGTGTAGGTC GTTCGCTCCA AGCTGGGCTG TGTGCACGAA CCCCCCGTTC 222 C 

AGCCCGACCG CTGCGCCTTA TCCGGTAACT ATCGTCTTGA GTCCAACCCG GTAAGACACG 22 8 C 

ACTTATCGCG ACTGGCAGCA GCCACTGGTA A C AG G ATT AG CAGAGCGAGG TATGTAGGCG 23 4 C 

GTGCTACAGA GTTCTTGAAG TGGTGGCCTA ACTACGGCTA CACTAGAAGG ACAGTATTTG 2 4 OC 

GTATCTGCGG TCTGCTGAAG CCAGTTAGGT TGGGAAAAAG AGTTGGTAGC TCTTGATCCG 24 6 C 

15 GCAAACAAAC CACCGCTGGT AGCGGTGGTT TTTTTGTTTG CAAGCAGCAG ATTACGGGCA 2520 

GAAAAAAAGG ATCTCAAGAA GATCCTTTGA TGTTTTCTAC GGGGTGTGAC GCTCAGTGGA 2 5 80 

ACGAAAACTC ACGTTAAGGG ATTTTGGTCA TGAGATTATC AAAAAGGATC TTCACCTAGA 2 64 0 

TGCTTTTAAA TTAAAAATGA AGTTTTAAAT CAATCTAAAG T A TAT A T G AG TAAAGTTGGT 2 7 00 

CTGACAGTTA CCAATGCTTA ATCAGTGAGG CACGTATGTC AGCGATGTGT CTATTTCGTT 2 76 0 
CATCCATAGT TGCCTGAGTC CCCGTCGTGT AGATAACTAC GATACGGGAG GGCTTACCAT 
CTGGCCCCAG TGCTGCAATG ATAGCGCGAG ACGCACGCTC ACGGGCTCCA GATTTATCAG 
CAATAAACGA GCCAGCCGGA AGGGCCGAGC GGAGAAGTGG TCCTGCAACT TTATCCGCCT 

CCATCCAGTC TATTAATTGT TGCCGGGAAG CTAGAGTAAG TAGTTCGCCA GTTAATAGTT 3 0 00 

TGCGCAACGT TGTTGCCATT GGTACAGGCA TCGTGGTGTC ACGCTCGTCG TTTGGTATGG 3 06 0 

3 5 CTTCATTCAG CTCCGGTTCC CAACGATCAA GGCGAGTTAC ATGATGCCCG ATGTTGTGCA 312 0 

AAAAAGGGGT TAGCTCCTTC GGTCCTCCGA TCGTTGTCAG AAGTAAGTTG GCCGCAGTGT 3 180 

TATCACTCAT GGTTATGGCA GCACTGGATA ATTCTCTTAC TGTCATGCCA TCCGT AAGAT 3 24 0 

GCTTTTCTGT GACTGGTGAG TACTCAACCA AGTCATTCTG AGAATAGTGT ATGCGGCGAC 3 3 00 

CGAGTTGCTC TTGCCCGGGG TCAATAGGGG ATAATACCGC GCCACATAGC AGAACTTTAA 3 36 0 

4 5 AAGTGCTCAT CATTGGAAAA CGTTCTTCGG GGCGAAAACT CTGAAGGATC TTACCGCTGT 3420 

TGAGATCCAG TTCGATGTAA CCCACTCGTG CACCCAACTG ATCTTCAGCA TCTTTTACTT 348C 

^ TCACCAGCGT TTCTGGGTGA GCAAAAACAG GAAGGGAAAA TGCCGC AAAA AAGGGAATAA 3 54 0 

GGGCGAGACG GAAATGTTGA ATACTCATAC TCTTCCTTTT TCAATATTAT TGAAGCATTT 3 60 0 

ATCAGGGTTA TTGTCTCATG AGCGGATACA TATTTGAATG TATTTAGAAA AATAAACAAA 3 66 0 

5 5 TAGGGGTTCG GCGCACATTT GGCCG AAAAG TGCCAGCTG 36 99 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS : 
60 (A) LENGTH: 6361 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

6 5 (li) MOLECULE TYPE: DMA 

( ix) FEATURE : 

(A) NAME /KEY : 



40 
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120 

180 

240 

300 

360 

420 

480 

54 C 

600 

660 

720 

780 
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(B) LOCATION: 1 . .6361 

(D) OTHER INFORMATION: /note= " pFREDl 3 " 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
TTCTCATCTT TGACAGCTTA T CAT C G AT AA GCTTTAATGC GGTAGTTTAT CACAGTTAAA 6 0 

TTGCTAACGC AGTCAGGCAC CGTGTATGAA ATCTAACAAT GCGCTCATCG TCATCCTCGG 
CACCGTCACC CTGGATGCTG TAGGCATAGG CTTGGTTATG CCGGTACTGC CGGGCCTCTT 
GCGGGATATC CGGATATAGT TCCTCCTTTC AGCAAAAAAC CCCTCAAGAC CCGTTTAGAG 
GCCCCAAGGG GTTATGCTAG TTATTGCTCA GCGGTGGCAG CAGCCAACTC AGCTTCCTTT 
CGGGCTTTGT TAGCAGCCGG ATCCTTATTT GTATAGTTCA TCCATGCCAT GTGTAATCCC 
AGCAGCTGTT ACAAACTCAA GAAGGACCAT GTGGTCTCTC TTTTCGTTGG GATCTTTCGA 
AAGGGCAGAT TGTGTGGACA GGTAATGGTT GTCTGGTAAA AGGACAGGGC CATCGCCAAT 
TGGAGTATTT TGTTGATAAT GGTCTGCTAG TTGAACGCTT CCATCTTCAA TGTTGTGTCT 
AATTTTGAAG T T AACTTTG A TTCCATTCTT TTGTTTGTCT GCCATGATGT ATACATTGTG 
TGAGTTATAG TTGTATTCCA ATTTGTGTCC AAGAATGTTT CCATCTTCTT TAAAATCAAT 
ACCTTTTAAC TCGATTCTAT TAACAAGGGT ATCACCTTCA AACTTGACTT CAGCACGTGT 
CTTGTAGTTC CCGTCATCTT TGAAAAATAT AGTTCTTTCC TGTACATAAC CTTCGGGCAT 
GGCACTCTTG AAAAAGTCAT GCCGTTTCAT ATGATCCGGG TATCTTGAAA AG CATTGAAC 84 0 

ACCATAAGAG AAAG TAGTG A CAAGTGTTGG CCATGGAACA GGTAGTTTTC CAGTAGTGCA 90 0 

AATAAATTTA AGGGTAAGTT TTCCGTATGT TGCATCACCT TCACCCTCTC CACTGACAGA 
AAATTTGTGC C CATTAAC AT CACCATCTAA TTCAACAAGA ATTGGGACAA CTCCAGTGAA 
GAGTTCTTCT CCTTTGCTAG CCATATGTAT ATCTCCTTCT TAAAG TT AAA CAAAATTATT 
TCTAGAGGGG AATTGTTATC CGCTCACAAT TCCCCTATAG TGAGTCGTAT TAATTTCGCG 
GGATCGAGAT CTCGATCCTC TACGCCGGAC GCATCGTGGC CGGCATCACC GGCGCCACAG 
GTGCGGTTGC TGGCGCCTAT ATCGCCGACA TCACCGATGG GGAAGATCGG GCTCGCCAC7 
TCGGGCTCAT GAGCGCTTGT TTCGGCGTGG GTATGGTGGC AGGCCCCGTG GCCGGGGGAC 
TGTTGGGCGC CATCTCCTTG CATGCACCAT TCCTTGCGGC GGCGGTGCTC AACGG CCTCA 
ACCTACTACT GGGCTGCTTC CTAATGCAGG AGTCGCATAA GGGAGAGCGT CGAGATCCCG 
GACACCATCG AATGGCGCAA AACCTTTCGC GGTATGGCAT GATAGCGCCC GGAAGAGAGT 
CAATTCAGGG TGGTGAATGT GAAACCAGTA ACGTTATACG ATGTCGCAGA GTATGCCGGT 
GTCTCTTATC AGACCGTTTC CCGCGTGGTG AACCAGG CCA GCCACGTTTC TGCGAAAACG 
CGGGAAAAAG TGGAAGCGGC GATGGCGGAG CTGAATTACA TTCCCAACCC CGTGGCACAA 
CAACTGGCGG GCAAACAGTC GTTGCTGATT GGCGTTGCCA CCTCCAGTCT GGCCCTGCAC 
GCGCCGTCGC AAATTGTCGC GGCGATTAAA TCTCGCGCCG ATCAACTGGG TGCCAGCGTG 
GTGGTGTCGA TGGTAGAACG AAGCGGCGTC GAAGCCTGTA AAGCGGCGGT GCACAATCTT 
CTCGCGCAAC GCGTCAGTGG GCTGATCATT AACTATCCGC TGGATGACCA GGATGCCATT 
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1560 
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GACGGTACGC 


GACTGGGCGT 


GG AG CATCTG 
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CATTAAGTTC 
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2100 






/— * •"»-« /"-» 

- jO^ I L>Cjv_ a <o 


G CATAAATAT 


CTCACTCGCA 


ATCAAATTCA 


GCCGATAGCG 


2160 
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GAACGGGAAG 




IbLlAIGi CC 


GGTTTTCAAG 


AAACCATGCA 


AATGCTGAAT 


2220 






J 1 ^ clAl lbv 


GATGCTbG: i 


G C CAACGATC 


AGATGGCGCT 


GGGCGCAATG 
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nTr* r r"T' 

bb rbjlb»CGTT 


G GTG CG GAT A 


TCTCGGTAGT 


GGG ATACGAC 


234 0 


1 5 




ALAbL I LA TO 


TTATATCCCG 


CCGTTAACCA 


CCATC AAACA 


GGATTTTCGC 


2400 




CTGCTGGGGC 




CC A CCf~*CT* r T'r-' 


GTGCAACTCT 


CTCAGGGCCA 


GGCGGTGAAG 


2460 


20 


>-> VJ r\J- \ 1 v_>ALj v_ 


TGTTGCCCGT 


LTCACTGGTG 


AAAAGAAAAA 


CCACCCTGGC 


GCCCAATACG 


2520 
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11 1 CCCbCb \_ 


GTTGGCCGAT 


TCATTAATG C 


AGCTGGCACG 


ACAGGTTTCC 


2580 






■jw\j\ajCAGTG 


AGCGCAACGC 


AATTAATGTA 


AGTTAGCTCA 


CTCATTAGGC 


2640 
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- GnCC^ATGC 


CCTTGAGAGC 


CTTCAACCCA 


GTCAGCTCCT 


TCCGGTGGGC 


2700 




r - * c c c <"■* <™ /~* » 


AGTATCGTCG 


CCG CACTTAT 


GACTGTCTTC 


TTTATCATGC 


AACTCGTAGG 


2760 
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Ti t\ cr^TC c 
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GGGTCATTTT 


CGGCGAGGAC 


CGCTTTCGCT 


GGAGCGCGAC 


2820 


CxATGATCGG-^ 


CTGTCGCTTG 


CGGTATTCGG 


AATCTTGCAC 


GCCCTCGCTC 


AAGCCTTCGT 


2880 




LALTGGTCCC 


GCCACCAAAC 


GTTTCGGCGA 


GAAGGAGGCC 


ATTATCGCCG 


GCATGGCGGC 


2940 
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CGACGCGCTG 


GGGTACGTCT 


TGCTGGCGTT 


CG CGACGCGA 


GGCTGGATGG 


CCTTCCCCAT 


3000 




rATGATTCT . 


CTCGCTTCCG 


GCGGCATCGG 


GATGCCCGCG 


TTGCAGGCCA 


TGCTGTCCAG 


3060 
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bLAbbTAGAT 


GACGACCATC 


AGGGACAGGT 


TCAAGGATCG 


CTCGCGGCTC 


TTACCAGCCT 


3120 




AACTTCGATC 


ACTCGACCGC 


TGATCGTCAC 


GGCGATTTAT 


GCCGCCTCGG 


CG AG CACATG 


3180 




bAACGGGTTG 


GCATGGATTG 


TAGGGGCCGC 


CCTATACCTT 


GTCTGCCTCC 


CCGCGTTGCG 


3240 
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TGGAGCCGGG 


CCACCTCGAC 


CTGAATGGAA 


GCCGGCGGCA 
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CTCCAAGAAT 
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CAATTCTTGC 


GGAGAACTGT 


GAATGCGCAA 


3360 
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GGCAGAACAT 


ATCCATCGCG 
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CCAGCAGCCG 


CACGCGGCGC 


3420 




vjCGTTGGGTC 


CTGGCCAGGG 


GTGCG CATG A 


TCGTGCTCCT 


GTCGTTGAGG 
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' — * ^ /- * 

'•-J 1 Ou^Ouuo 


TTGCCTTACT 


GGTTAGCAGA 


ATGAATCACC 


GATACGCGAG 
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GCGACTGCTG 


CTG CAAAACG 


TCTGCGACCT 


GAGCAACAAC 


ATGAATGGTC 


3600 




x^l LGGTTTCC 


GTGTTTCGTA 


AAGTCTGGAA 


ACGCGGAAGT 


CAGCGCCCTG 


CACCATTATG 


3660 


60 


TTCCGGATCT 


GCATCGCAGG 


ATGGTGCTGG 


CTACCCTGTG 


GAACACCTAC 


ATCTGTATTA 


3720 


ACGAAGCGCT 


GGCATTGACC 


CTGAGTGATT 


TTTCTCTGGT 


CCCGCCGCAT 


CCATACCGCC 


3780 




AGTTGTTTAC 


CCTCACAACG 


TTCCAGTAAC 


CGGGCATGTT 


CATC ATC AG T 


AACCCGTATC 


384C 


65 


GTGAGGATCC 


TCTCTCGTTT 


CATCGGTATC 


ATTACCCCCA 


TGAACAGAAA 


TCCCCCTTAC 


3 900 




ACGGAGGCAT 


CAGTGACCAA 


ACAGGAAAAA 


ACCGCCCTTA 


ACATGGCCCG 


CTTTATCAGA 


3 960 




AGCCAGACAT 


TAACGCTTCT 


GGAGAAAGTC 


AACGAGCTGG 


ACGCGGATGA 


ACAGGCAGAC 


4020 
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ATCTGTGAAT CGCTTCACGA CCACGCTGAT GAGCTTTACC GCAGCTGCCT CGCG CGTTTC 
GGTGATGACG GTGAAAACCT CTGACACATG CAGCTCCCGG AGACGGTCAC AGCTTGTCTG 
TAAGCGGATG CCGGGAGCAG ACAAGCCCGT CAGGGCGCGT CAGCGGGTGT TGGCGGGTGT 
CGGGGCGCAG CCATGACCCA GTCACGTAGC GATAGCGGAG TGTATACTGG CTTAACTATG 
CGGCATCAGA GCAGATTGTA CTGAGAGTGC ACCATATATG CGGTGTGAAA TACCGCACAG 
ATGCGTAAGG AGAAAATACC GCATCAGGCG CTCTTCCGCT TCCTCGCTCA CTGACTCGCT 
GCGCTCGGTC GTTCGGCTGC GGCGAGCGGT ATCAGCTCAC TCAAAGGCGG TAATACGGTT 
ATCCACAGAA TCAGGGGATA ACGCAGGAAA GAACATGTGA GCAAAAGGCC AGCAAAAGGC 
CAGGAACCGT AAAAAGGCCG CGTTGCTGGC GTTTTTCCAT AGGCTCCGCC CCCCTGACGA 
GCATCACAAA AATCGACGCT CAAGTCAGAG GTGGCGAAAC CCGACAGGAC TATAAAGATA 
CCAGGCGTTT CCCCCTGGAA GCTCCCTCGT GCGCTCTCCT GTTCCGACCC TGCCGCTTAC 
CGGATACCTG TCCGCCTTTC TCCCTTCGGG AAGCGTGGCG CTTTCTCATA GCTCACGCTG 
TAGGTATCTC AGTTCGGTGT AGGTCGTTCG CTCCAAGCTG GGCTGTGTGC ACGAACCCCC 
CGTTCAGCCC GACCGCTGCG CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGTAAG 
ACACGACTTA TCGCCACTGG CAGCAGCCAC TGGTAACAGG ATTAGCAGAG CGAGGTATGT 4 92 0 

AGGCGGTGCT ACAGAGTTCT TGAAGTGGTG GCCTAACTAC GGCTACACTA GAAGGACAGT 4 96 0 

ATTTGGTATC TGCGCTCTGC TGAAGCCAGT TACCTTCGGA AAAAGAGTTG GTAGCTCTTG 
ATCCGGCAAA CAAACCACCG CTGGTAGCGG TGGTTTTTTT GTTTGCAAGC AGCAGATTAC 
GCGCAGAAAA AAAGGATCTC AAGAAGATCC TTTGATCTTT TCTACGGGGT CTGACGCTCA 516 0 

GTGGAACGAA AACTCACGTT AAGGGATTTT GGTCATGAGA TTATCAAAAA GGATCTTCAC 522 0 

CTAGATCCTT TTAAATTAAA AATGAAGTTT TAAATCAATC TAAAGTATAT ATGAGTAAAC 52 8 0 

TTGGTCTGAC AGTTACCAAT GCTTAATCAG TGAGGCACCT ATCTCAG CG A TCTGTCTATT 534 0 

TCGTTCATCC ATAGTTGCCT GACTCCCCGT CGTGTAGATA ACTACGATAC GGGAGGGCTT 54 00 

ACCATCTGGC CCCAGTGCTG CAATGATACC GCGAGACCCA CGCTCACCGG CTCCAGATTT 
ATCAGCAATA AACCAGCCAG CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC 
CGCCTCCATC CAGTCTATTA ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA 
TAGTTTGCGC AACGTTGTTG CCATTGCTSC AGGCATCGTG GTGTCACGCT CGTCGTTTGG 
TATGGCTTCA TTCAGCTCCG GTTCCCAACG ATCAAGGCGA GTTACATGAT CCCCCATGTT 
GTGCAAAAAA GCGGTTAGCT CCTTCGGTCC TCCGATCGTT GTCAGAAGTA AGTTGGCCGC 
AGTGTTATCA CTCATGGTTA TGGCAGCACT G CAT AATT C T CTTACTGTCA TGCCATCCGT 
AAGATCCTTT TCTGTGACTG GTGAGTACTC AACCAAGTCA TTCTGAGAAT AGTGTATGCG 
GCGACCGAGT TGCTCTTGCC CGGCGTCAAC ACGGGATAAT ACCGCGCCAC ATAGCAGAAC 
TTTAAAAGTG CTCATCATTG GAAAACCTTC TTCGGGGCGA AAACTCTCAA GGATCTTACC 
GCTTCTTGAGA TCCAGTTCGA TGTAACCCAC TCGTGCACCC AACTGATCTT CAGCATCTTT 
TACTTTCACC AGCGTTTCTG GGTGAGCAA/t AACAGGAAGG CAAAATGCCG CAAAAAAGGG 
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AATAAGGGCG ACACGGAAAT GTTGAATACT CATACTCTTC CTTTTTCAAT ATTATTGAAG 6180 

CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT GAATGTATTT AGAAAAATAA 624 0 

5 ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA CCTGACGTCT AAgAAACCAT 63 00 

TATTATCATG ACATTAACCT ATAAAAATAG GCGTATCACG AGGCCCTTTC GTCTTCAAGA 63 60 

10 * 6361 

(2) INFORMATION FOR SEQ ID NO : 8 : 

<i) SEQUENCE CHARACTERISTICS : 
lD (A) LENGTH: 4 8 base pairs 

<B) TYPE : nuciexc acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

20 (ii) MOLECULE TYPE: DNA 

(lx) FEATURE : 

(A) NAME /KEY: - 
25 <B> LOCATION: 1 . .48 

(D) OTHER INFORMATION: /note- "oligonucleotide 817422" 



30 



35 



40 



45 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
CAATTTGTGT CCCAGAATGT TGCCATCTTC CTTGAAGTCA ATACCTTT 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE : DNA 



4 8 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. ,47 

5 (D} C ™ER INFORMATION: /note= "oligonucleotide S17423" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
GTCTTGTAGT TGCCGTCATC TTTGAAGAAG ATGCTCCTTT CCTGTAC 47 

(2} INFORMATION FOR SEQ ID NO : 1 0 : 

<i> SEQUENCE CHARACTERISTICS : 
^ *A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY linear 

20 (n) MOLECULE TYPE: DNA 

tlx) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..52 

(D) OTHER INFORMATION: /note, "oligonucleotide S17424" 



30 



35 



40 



50 



55 



60 



65 



(Xl J SEQUENCE DESCRIPTION: SEQ ID NO : 1 0 : 
CATGGAACAG GCAGTTTGCC AG TAG TG C AG ATGAACTTCA GGGTAAGTTT TC 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA 



4 5 (ix) FEATURE 

(A) NAME / KEY : 



(B) LOCATION : 1. .40 

CD) OTHER INFORMATION: /note, "oligonucleotide K17425" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CTCCACTGAC AGAGAACTTG TGGCCGTTAA CATCACCATC 40 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( ll ) MOLECULE TYPE : DNA 
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30 



35 



50 



(ix) FEATURE ; 

(A) NAME /KEY ; - 

(B) LOCATION: 1. .47 

(D) OTHER INFORMATION: /note- "oligonucleotide *17426' 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO ; 1 2 : 
CCATCTTCAA TGTTGTGGCG GGTCTTGAAG TTCACTTTGA TTCCATT 

(2) INFORMATION FOR SEQ ID NO: 13: 



(l) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 4 1 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

2 0 (ii) MOLECULE TYPE : DNA 

(ix) FEATURE; 

(A) NAME /KEY : - 
2 5 ( B ) LOCATION: 1. .41 

(D) OTHER INFORMATION: /note= "oligonucleotide #1746S" 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 1 3 : 
CGATAAGCTT GAGGATCCTC AGTTGTACAG TTCATCCATG C 

(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 84 9 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
4 0 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



4 5 (ix) FEATURE : 

(A) NAME/ KEY : 



(B) LOCATION : 1 . . 84 9 

(D) OTHER INFORMATION: /note- "pBSGFPsgll 



41 





(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 4 










ATGACCATGA 


TTACGCCAAG 


CTCGGAATTA 


ACCCTCACTA 


AAGGGAACAA 


AAGCTGGAGC 


60 


55 


TCCACCGCGG 


TGGCGGCCGC 


TCTAGAACTA 


GTGGATCCCC 


CGGGCTGCAG 


GAATTCGATC 


120 




GCGCAAGAAA 


TGGCTAGCAA 


AGGAGAAGAA 


CTCTTCACTG 


GAGTTGTCCC 


AATTCTTGTT 


180 


60 


GAATTAGATG 


GTGATGTTAA 


CGGCCACAAG 


TTCTCTGTCA 


GTGGAGAGGG 


TGAAGGTGAT 


240 


GCAACATACG 


GAAAACTTAC 


CCTGAAGTTC 


ATCTGCACTA 


CTGGCAAACT 


GCCTGTTCCA 


300 




TGG CCAAC AC 


TTGTCACTAC 


TCTCTCTTAT 


GGTGTTCAAT 


GCTTTTCAAG 


ATACCCGGAT 


360 


65 


CAT AT G AAA C 


GGCATGACTT 


TTTCAAGAGT 


GCCATGCCCG 


AAGGTTATGT 


ACAGGAAAGG 


42C 




ACCATCTTCT 


TCAAAGATGA 


CGGCAACTAC 


AAGACACGTG 


CTGAAGTCAA 


GTTTGAAGGT 


4SC 




GATACCCTTG 


TTAATAGAAT 


CGAGTTAAAA 


GGTATTGACT 


TCAAGGAAGA 


TGGCAACATT 


54C 
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15 



20 



65 



CTGGGACACA AATTGGAATA CAACTATAAC TCACACAATG TATA CAT CAT GGCAGACAAA 
CAAAAGAATG GAATCAAAGT GAACTTCAAG ACCCGCCACA ACATTGAAGA TGGAAGCGTT 
CAACTAGCAG ACCATTATCA ACAAAATACT CCAATTGGCG ATGGCCCTGT " CCTTTTACCA 
GACAACCATT ACCTGTCCAC ACAATCTGCC CTTTCGAAAG ATCCCAACGA AAAGAGAGAC 
CACATGGTCC TTCTTGAGTT TGTAACAGCT GCTGGGATTA CACATGGCAT GGATGAACTG 



(2) INFORMATION FOR SEQ ID NO : 1 5 : 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pain 
<B> TYPE: nucleic acid 
(C) STRANDEDNESS: single 
fD) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 



(il) MOLECULE TYPE: DNA 
(ix) FEATURE: 



600 
660 
720 
780 



10 i ± •v.ALAibliCAT GGATGAACTG 84 0 

TACAACTGA 



849 



25 (ix) FEATURE: 

(A) NAME /KEY : - 
( B) LOCATION : 1 . . 72 0 

(D) OTHER INFORMATION: /note= "SG12" 

30 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 
3 5 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 
4Q CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA G ATACCCG G A TCATATGAAA 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 

4 5 GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 
5q GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 

5 5 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 72 C 

(2) INFORMATION FOR SEQ ID NO: 16: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



360 



420 



600 



660 
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(A) NAME /KEY : - 

(B) LOCATION: 1..72 0 

(D) OTHER INFORMATION: /note* "SG11" 

5 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 
10 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 
CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0^ 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG G AC CAT CTT C 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 

2 0 GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 
^ GGAAT C AAA G TGAACTTCAA GACCCGCCAC AACATTCAAG ATGGAAGCGT TCAACTAGCA 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66 0 

3 0 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 72 0 

(2) INFORMATION FOR SEQ ID NO: 17: 

3 5 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 720 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



40 



65 



< i i ) MOLECULE TYPE : DNA 



60 
120 



300 
360 
420 
480 
S4 0 
600 



(ix) FEATURE: 
4 5 (A) NAME /KEY : - 

(B) LOCATION : 1 . . 72 0 

(D) OTHER INFORMATION. /note= "SG2B" 

50 (xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 1 7 : 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 6 0 

^ GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 12 0 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

6 0 CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 3 00 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 36 0 

GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 4 80 

GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA S40 
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GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 6 00 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66 0 

5| CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 72 0 

(2) INFORMATION FOR SEQ 13 NO : 1 8 : 

10 (i) SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

1 5 

(ll) MOLECULE TYPE: DNA 

(IX) FEATURE: 

2 0 (A) NAME /KEY: - 

(B) LOCATION : 1. .40 

(D) OTHER INFORMATION: /note= "oligonucleotide #18217" 

25 ixi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 8 : 

CATTGAACAC CATAGCACAG AGTAGTGACT AGTGTTGGCC 4 0 

30 (2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



40 



45 



50 



60 



65 



(ll) MOLECULE TYPE: DNA 



fix) FEATURE: 

(A) NAME / KEY : - 

(B) LOCATION : 1 . . 72 0 

(D) OTHER INFORMATION: /note= "SB42" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 9 : 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 6 0 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 12 0 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGG CAAAC TGCCTGTTCC ATGGCCAACA 18 0 

5 5 CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 3 00 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 42 0 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 



480 



GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 6 00 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66 0 
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CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 72 0 

(2) INFORMATION FOR SEQ ID NO: 20: 

5 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

ill) MOLECULE TYPE: DNA 

15 (ix) FEATURE: 

(A) NAME/ KEY : - 

(B) LOCATION ; 1. .40 

(D) OTHER INFORMATION: /note= "oligonucleotide #bio25" 



20 



25 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 20 : 
CATTGAACAC CATGAGAGAG AGTAGTGACT AGTGTTGGCC 4 0 

(2) INFORMATION FOR SEQ ID NO : 2 1 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pair: 
30 (B) TYPE: nucleic acid 

(C> STRANDEDNESS : single 
(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE; 

(A) NAME /KEY : - 

(B) LOCATION: 1..720 

40 (D) OTHER INFORMATION: /note= "SB49" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 1 : 

4 5 ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 6 0 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 12 0 

GGAAAACTTA CCCTG AAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

50 

CTAGTCACTA CTTTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 3 00 

5 5 TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 36 0 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATAC AT C A TGGCAGACAA ACAAAAGAAT 4 80 

60 

GGAATCAAAG CGAACTTCAA GATCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

6 5 TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66 C 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 72 0 
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12) INFORMATION FOR SEQ ID NO: 22: 

. i: SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 4 4 base pairs 
(3) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(Hi MOLECULE TYPE : DNA 



(ix) FEATURE : 

(A) NAME /KEY : - 

(B) LOCATION : 1 . . 44 

(D) OTHER INFORMATION: /note, "oligonucleotide #19059" 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 22 : 
CTTCAATCTT GTGGCGGATG TTGAAGTTCG CTTTGATTCC ATTC 



(2) I NFORMAT Z ON FOR SEQ ID NO: 23: 

in SEQUENCE CHARACTERISTICS: 
' A LENGTH 4 0 base pairs 
«n» TYPE nucleic acid 
: C ? STRANDEDNESS : single 
;D? TOPOLOGY, linear 

(li) MOLECULE TYPE: DNA 



<ix> FEATURE. 

( A ) NAME /KEY; - 

f B) LOCATION: 1 . . 40 

(D) OTHER INFORMATION: /note= "oligonucleotide #blo24" 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 2 3 : 
CATTGAACAC CATGAGAGAA AGTAGTGACT AGTGTTGGCC 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A} LENGTH: 720 base pairs 
(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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80 

[ IX i FEATURE : 

!A) NAME /KEY: - 
(B) LOCATION : 1 . . 72 0 

(Di OTHER INFORMATION: /note- "SB50" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 6C 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGG CCAACA 180 

15 CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA T CAT AT G AAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 3 00 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 36 0 

GTTAATAGAA T CG AG TT AAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 4 80 

GGAATCAAAG CGAACTTCAA GATCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 6 00 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66 0 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 72 0 



20 



30 



35 



(2) INFORMATION FOR SEQ ID NO : 2 5 ; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
4 0 (D) TOPOLOGY: linear 

(li) MOLECULE TYPE : DNA 

4 5 (ix) FEATURE: 

(A) NAME/ KEY : - 

(B) LOCATION: 1 . .1521 

(D) OTHER INFORMATION: /note- "pCMVgfoll" 

50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 6 0 

5 5 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 12 0 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

60 

CGGCATGACT TTTTCAAGAG TCCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 3 00 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 36 0 

6 5 GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 4 2 0 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 4 80 

GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 
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40 



45 



81 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 60 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66. 
5 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGG ATGAACT GTACAACGGT 72, 

GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 78 ( 

iq CTATTCGGCT ATGACTGGGC ACAACAGACA ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 84 C 

CTGTCAGCGC AGGGGCGGCC GGTTCTTTTT GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 9 0C 

GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA CGACGGGCGT TCCTTG CGCA 96 C 

15 GCTGTGCTCG ACGTTCTCAC TGAAGCGGGA AGGGACTGGC TGCTATTGGG CGAAGTGCCG 102 0 

GGGCAGGATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA AAGTATCCAT CATGGCTGAT 1080 
^ GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC CATTCGACCA CCAAGCGAAA i l4 0 

CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC TTGTCGATCA GGATGATCTG 
GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 
2 5 CCCGACGGCG AGGATCTCGT CGTGACGCAT GGCGATGCCT GCTTGCCGAA TATCATGGTG 

GAAAATGGCC GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 
^ CAGGACATAG CGTTGGCTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 144 0 

CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCGC 1 50 0 
GTTCTTGACG AGTTCTTCTG A 

35 

(2) INFORMATION FOR SEQ ID NO: 26: 

U) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
CC) STRANDEDNESS : 
(D) TOPOLOGY: linear 

Ui) MOLECULE TYPE: peptide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 6 : 

Gly Ala Gly Ala 

50 i 



1200 
1260 
1320 
1360 



(2) INFORMATION FOR SEQ ID NO : 2 7 : 

55 (i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 3 2 base pair- 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(DJ TOPOLOGY: linear 



1521 



60 



(il) MOLECULE TYPE; 



DNA 
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tlx) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION : 1 . . 32 

(D) OTHER INFORMATION: /note- "primer Bio51' 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO : 2 7 : 
CGCGGATCCT TCGAACAAGA TGGATTGCAC GC 

\2) INFORMATION FOR SEQ ID NO ; 2 8 : 



(l) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 34 base pair: 

(B) TYPE: nucleic acid 
(CI STRANDEDNESS : single 
ID) TOPOLOGY: linear 

2 0 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE 

-A) NAME /KEY: - 
2 5 ,B> LOCATION: 1..3 4 

-D 1 v^THHF INFORMATION : /'note= "primer BioSI 



(xi) DESCRIPTION: SEQ ID NO : 2 8 : 

CCGGAATTCT C AGAAGAA CT CGTCAAGAAG GCGA 3 4 

(2) INFORMATION FOP SEQ ID NO : 29 : 



£i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: -16 base pair- 

(B) TYPE nucleic acid 

(C) STRAIJDEDNESS : single 
4 0 ( D ) TOPOLOGY: linear 

(il) MOLECULE TYPE DNA 



4 5 (ix) FEATURE: 

(A) NAME /KEY ■ 



(B) LOCATION : 1. .46 

(D) OTHER INFORMATION: /note- "primer Bio49' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 9 : 
GGCGCGCAAG AAATGGCTAG CAAAGGAGAA GAACTCTTCA CTGGAG 

(2) INFORMATION FOR SEQ ID NO : 3 0 : 



<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
60 <B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



65 



(il) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 4 6 

CD) OTHER INFORMATION: /noCe= "primer BioSO" 

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO : 3 0 : 
CCCATCGATA GCACCAGCAC CGTTGTACAG TTCATCCATG CCATGT 

12) INFORMATION FOR SEQ ID NO : 3 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1521 base Dairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY : linear 

(ll) MOLECULE TYPE .- DNA 

fix) FEATURE: 

(A) NAME /KEY : - 

<B) LOCATION : 1 . . 1521 

(D) OTHER INFORMATION: /note- " P PGKgfo2S" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 1 : 
ATGCCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 
GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 
GGAAAACTTA CCCTG AAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 
CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 
CGGCATCACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 
GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 
AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 
GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 
GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 
CTTCTTGAGT TTGTAACAGC TG CTGGGATT ACACATGGCA TGGATGAACT GTACAACGGT 
GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 
CTATTCGGCT ATGACTGGGC A C AA C AG AC A ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 
CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 
GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA CGACGGGCGT TCCTTGCGCA 
GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA AGGGACTGGC TGCTATTGGG CGAAGTGCCG 
GGG CAGG ATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA AAGTATCCAT CATGGCTGAT 
GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC CATTCGACCA CCAAGCGAAA 
CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC TTGTCGATCA GGATGATCTG 



60 
120 
180 
240 
300 
360 
420 
480 
54 0 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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84 

GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 
rCCGACGGCG AGGATCTCGT CGTGACCCAT GGCGATGCCT GCTTGCCGAA TAT CATGGTG 
GAAAATGGCC GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 
CAGGACATAG CGTTGGCTAC CCGTGATATT G CTG AAG AG C TTGGCGGCGA ATGGGCTGAC 
CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCGC 



(2) INFORMATION FOR SEQ ID NO ; 3 2 ; 



(l) SEQUENCE CHARACTERISTICS: 
!A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinale 
20 ID) TOPOLOGY: linear 

(11) MOLECULE TYPE : DNA 



2 5 (xx) FEATURE: 

(A) NAME /KEY : 



(B) LOCATION : 1 . . 26 

(D) OTHER INFORMATION: /note- "oligonucleotide #18990' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 2 : 
GACCGGGACA CGTATCCAGC CTCCGC 

(2) INFORMATION FOR SEQ ID NO : 3 3 : 



<l) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 28 base pair: 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



(ll) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 28 

<D) OTHER INFORMATION: /note= "oligonucleotide #18991' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 3 : 
5 5 GGAGGCTGGA TACGTGTCCC GGTCTGCA 

(2) INFORMATION FOR SEQ ID NO: 34: 

6° <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7617 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



( i i ) MOLECULE TYPE : DNA 
(IX) FEATURE: 



1260 
1320 
138C 
1440 



2Q . ^v-^ A i 1 LTATCGC 15 00 

CTTCTTGACG AGTTCTTCTG A 



1521 



26 
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(A) NAME / KEY : - 

(B) LOCATION: 1..7617 

(D) OTHER INFORMATION: /note= " pGen - PGKgf o2 5RO ■■ 
(XI) SEQUENCE DESCRIPTION : SEC ID NO : 3 4 ; 
TCGAGGTCGA CGGTATCGAT TAGTCCAATT TGTTAAAGAC AGGATATCAG TGGTCCAGGC 
TCTAGTTTTG ACTCAACAAT ATCACCAGCT GAAGCCTATA GAGTACGAGC CATAGATAAA 
ATAAAAGATT TTATTTAGTC TCCAGAAAAA GGGGGGAATG AAAGACCCCA CCTGTAGGTT 
TGGCAAGCTA GCTTAAGTAA CCCCATTTTG CAAGGCATGG AAAAATACAT AACTGAGAAT 
AGAGAAGTTC AGATCAAGGT CAGGAACAGA TGGAACAGCT GAATATGGGC CAAACAGGAT 
ATCTGTGGTA AGCAGTTCCT GCCCCGGCTC AGGGCCAAGA ACAGATGGAA CAGCTGAATA 
TGGGCCAAAC AGGATATCTG TGGTAAGCAG TTCCTGCCCC GGCTCAGGGC CAAGAACAGA 
TGGTCCCCAG ATGCGGTCCA GCCCTCAGCA GTTTCTAGAG AACCATCAGA TGTTTCCAGG 
GTGCCCCAAG GACCTGAAAT GACCCTGTGC CTTATTTGAA CTAACCAATC AGTTCGCTTC 
TCGCTTCTGT TCGCGCGCTT CTGCTCCCCG AGCTCAATAA AAGAGCCCAC AACCCCTCAC 
TCGGGGCGCC AGTCCTCCGA TTGACTGAGT CGCCCGGGTA CCCGTGTATC CAATAAACCC 
TCTTGCAGTT GCATCCGACT TGTGGTCTCG CTGTTCCTTG GGAGGGTCTC CTCTGAG TG A 
TTGACTACCC GTCAGCGGGG G TCTTTCATT TGGGGGCTCG TCCGGGATCG GGAGACCCCT 
GCCCAGGGAC CACCGACCCA CCACCGGGAG GTAAGCTGGC CAGCAACTTA TCTGTGTCTG 
TCCGATTGTC TAGTGTCTAT GACTGATTTT ATGCGCCTGC GTCGGTACTA GTTAGCTAAC 
TAGCTCTGTA TCTGGCGGAC CCGTGGTGGA ACTGACGAGT TCGGAACACC CGGCCGCAAC 
CCTGGGAGAC GTCCCAGGGA CTTCGGGGGC CGTTTTTGTG GCCCGACCTG AGTCCAAAAA 
TCCCGATCGT TTTGGACTCT TTGGTGCACC CCCCTTAGAG GAGGGATATG TGGTTCTGGT 
AGGAGACGAG AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGGA 
CCGAAGCCGC GCCGCGCGTC TTGTCTGCTG CAGCATCGTT CTGTGTTGTC TCTGTCTGAC 
TGTGTTTCTG TATTTGTCTG AGAATATGGG CCAGACTGTT ACCACTCCCT TAAGTTTGAC 
CTTAGGTCAC TGGAAAGATG TCGAGCGGAT CGCTCACAAC CAGTCGGTAG ATGTCAAGAA 
GAGACGTTGG GTTACCTTCT GCTCTGCAGA ATGGCCAACC TTTAACGTCG GATGGCCGCG 
AGACGGCACC TTTAACCGAG ACCTCATCAC C CAGGTTAAG ATCAAGGTCT TTTCAC CTG G 
CCCGCATGGA CACCCAGACC AGGTCCCCTA CATCGTGACC TGGGAAGCCT TGGCTTTTGA 
CCCCCCTCCC TGGGTCAAGC CCTTTGTACA CCCTAAGCCT CCGCCTCCTC TTCCTCCATC 
CGCCCCGTCT CTCCCCCTTG AACCTCCTCG TTCGACCCCG CCTCGATCCT CCCTTTATCC 
AGCCCTCACT CCTTCTCGAC GGTATACAGA CATGATAAGA TACATTGATG AGTTTGGACA 
AACCACAACT AGAATGCAGT GAAAAAAATG CTTTATTTGT GAAATTTGTG ATGCTATTG C 
TTTATTTGTA ACCATTATAA GCTGCAATAA ACAAGTTGGG GTGGGCGAAG AACTCCAGCA 
TGAGATCCCC GCGCTGGAGG ATCATCCAGC CGGCGAACGT GGCGAGAAAG GAAGGGAAGA 



60 
120 
180 
240 
300 
360 
420 
480 
S40 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
168C 
1740 
1800 
1860 



WO 97/42320 



PCT/IJS97/07625 



AAGCGAAAGG AGCGGGCGCT AGGGCGCTGG 
CCACACCCGC CGCGCTTAAT GCGCCGCTAC 
C AGCTGGTTCT TTCCGCCTCA GAAGCCATAG 

TGTCTTCCCA ATCCTCCCCC TTGCTGTCCT 
ACCTACTCAG ACAATGCGAT GCAATTTCCT 

1 0 

CACCTTCCAG GGTCAAGGAA GGCACGGGGG 
AAGGCACAGT CGAGGCTG AT CAGCGAGCTC 
IS CCTCTAGATG CATGCTCGAG CGGCCGCCAG 

AACTCGTCAA GAAGGCGATA GAAGGCGATG 
AGCACGAGGA AGCGGTCAGC CCATTCGCCG 

20 

AACGCTATGT CCTGATAGCG GTCCGCCACA 
AAGCGGCCAT TTTCCACCAT GATATTCGGC 

2 5 TCCTCGCCGT CGGGCATGCG CGCCTTGAGC 

TGATCCTCTT CGTCCAGATC ATCCTGATCG 
CGCTCGATGC GATGTTTCGC TTGGTGGTCG 

30 

AGCCGCCGCA TTGCATCAGC CATGATGGAT 
AGGAGATCCT GCCCGGGCAC TTCGCCCAAT 

3 5 ACGTCGAGCA CAGCTGCGCA AGGAA CGCCC 

TCGTCCTGCA GTTCATTCAG GGCACCGGAC 
CCCTGCGCTG ACAGCCGGAA CACGGCGGCA 

40 

TCATAGCCGA ATAGCCTCTC CACCCAAGCG 
TCGATAGCAC CAGCACCGTT GTACAGTTCA 

4 5 ACAAACTCAA GAAGGACCAT GTGGTCTCTC 

TGTGTGGACA GGTAATGGTT GTCTGGTAAA 
TGTTGATAAT GGTCTGCTAG TTGAACGCTT 

50 

TTCACTTTGA TTCCATTCTT TTGTTTGTCT 
TTGTATTCCA ATTTGTGTCC CAGAATGTTG 

5 5 TCGATTCTAT TAACAAGGGT ATCACCTTCA 

CCGTCATCTT TGAAGAAGAT GGTCCTTTCC 
AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 

60 

AGAGTAGTGA CTAGTGTTGG CCATGGAACA 
AGGGTAAGTT TTCCGTATGT TGCATCACCT 

6 5 CCGTTAACAT CACCATCTAA TTCAACAAGA 

CCTTTGCTAG CCATTTCTTG CGCGCCCGCG 
CGAAAGGCCG GGAGATGAGG AAGAGGAGAA 



86 

CAAGTGTAGC GGTCACGCTG CGCGTAACCA 192 0 

AGGGCGCGTG GGGATACCCC CTAGAGCCCC 19B0 

AGCCCACCGC ATCCCCAGGA TGCCTGCTAT 2 04 0 

GCCCCACCCC ACCCCCC AGA ATAGAATGAC 2100 

CATTTTATTA GGAAAGGACA GTGGGAGTGG 216 0 

AGGGGCAAAC AACAGATGGC TGGCAACTAG 2 22 0 

TAGCATTTAG GTGACACTAT AGAATAGGGC 22 8 0 

TGTGATGGAT ATCTGCAGAA TTCTCAGAAG 234 0 

CGCTGCGAAT CGGGAGCGGC GATACCGTAA 2 4 00 

CCAAGCTCTT CAGCAATATC ACGGGTAGCC 24 6 0 

CCCAGCCGGC CACAGTCGAT GAATCCAGAA 2 52 0 

AAGCAGGCAT CGCCATGGGT CACGACGAGA 2 580 

CTGGCGAACA GTTCGGCTGG CGCGAGCCCC 2 64 0 

ACAAGACCGG CTTCCATCCG AGTACGTGCT 2 70 0 

AATGGGCAGG TAGCCGGATC AAGCGTATGC 2 76 0 

ACTTTCTCGG CAGGAGCAAG GTGAGATGAC 2 82 0 

AGCAGCC AGT CCCTTCCCGC TTCAGTGACA 2880 

GTCGTGGCCA GCCACGATAG CCGCGCTGCC 2 94 0 

AGGTCGGTCT TGACAAAAAG AACCGGGCGC 3 000 

TCAGAGCAGC CGATTGTCTG TTGTGCCCAG 3 06 0 

GCCGGAGAAC CTGCGTGCAA TCCATCTTGT 312 0 

TCCATGCCAT GTGTAATCCC AGCAGCTGTT 3180 

TTTTCGTTGG GATCTTTCGA AAGGGCAGAT 3 24 0 

AGGACAGGGC CATCGCCAAT TGGAGTATTT 3 3 00 

CCATCTTCAA TGTTGTGGCG GGTCTTGAAG 3 36C 

GCCATGATGT ATACATTGTG TGAGTTATAG 3420 

CCATCTTCCT TGAAGTCAAT ACCTTTTAAC 3 4 80 

AACTTGACTT CAGCACGTGT CTTGTAGTTG 3 54 0 

TGTACATAAC CTTCGGG CAT GGCACTCTTG 3 6 00 

TATCTTGAAA AGCATTGAAC ACCATAGCAC 366 0 

GGCAGTTTGC CAGTAGTGCA GATGAACTTC 3 72C 

TCACCCTCTC CACTGACAGA GAACTTGTGG 3 780 

ATTGGGACAA CTCCAGTG AA GAGTTCTTCT 3 84 0 

GAGGCTGGAT ACGTGTCCCG GTCTGCAGGT 3 900 

CAGCGCGGCA GACGTGCGCT TTTGAAGCGT 3 96 0 
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GCAGAATGCC GGGCTCCGGA GGACCTTCGC GCCCGCCCCG CCCCTGAGCC CGCCCCTGAG 4 020 

CCCGCCCCCG GACCCACCCC TTCCCAGCCT CTGAGCCCAG AAAGCGAAGG AGCCAAGCTG 4 08 0 

CTATTGGCCG CTGCCCCAAA GGCCTACCCG CTTCCATTGC TCAGCGGTGC TGTCCATCTG 414 0 

CACGAGACTA GTGAGACGTG CTACTTCCAT TTGTCACGTC CTGCACGACG CGAGCTGCGG 42 00 

GGCGGGGGGG AACTTCCTGA CTAGGGGAGG AGTAGAAGGT GGCGCGAAGG GGCCACCAAA 4 2 60 

GAAGGGAGCC GGTTGGCGCT ACCGGTGGAT GTGGAATGTG TGCGAGGCCA GAGGCCACTT 4 320 

GTGTAGCGCC AAGTGCCAGC GGGGCTGCTA AAGCGCATGC TCCAGACTGC CTTGGGAAAA 4 380 

GCGCCTCCCC TACCCGGTAG AATTCGATAT CAAGCTTATC GATACCGTCG AGATCTCCCG 4 44 0 

ATCCGTCGAG GTCGACGGTA TCGATTAGTC CAATTTGTTA AAGACAGGAT ATCAGTGGTC 4 50 0 

CAGGCTCTAG TTTTGACTCA ACAATATCAC CAGCTGAAGC CTATAGAGTA CGAGCCATAG 4 56 0 

ATAAAATAAA AGATTTTATT TAGTCTCCAG AAAAAGGGGG GAATGAAAGA CCCCACCTGT 4 62 0 

AGGTTTGGCA AGCTAGCTTA AGTAACGCCA TTTTGCAAGG CATGGAAAAA TACATAACTG 46 80 

AGAATAGAGA AGTTCAGATC GGGATCCCAA TTCTTTCGGA CTTTTGAAAG TGATGGTGGT 4 74 0 

GGGGGAAGGA TT CG AACCTT CGAAGTCGAT GACGGCAGAT TTAGAGTCTG CTCCCTTTGG 4 8 00 

CCGCTCGGGA ACCCCACCAC GGGTAATGCT TTTACTGGCC TGCTCCCTTA TCGGGAAGCG 4 86 0 

GGGCGCATCA TATCAAATGA CGCGCCGCTG TAAAGTGTTA CGTTGAGAAA GAATTGGGAT 4 92 0 

CCCGATCAAG GTCAGGAACA GATGGAACAG CTAGAGAACC ATCAGATGTT TCCAGGGTGC 4 98 0 

CCCAAGGACC TGAAATGACC CTGTGCCTTA TTTGAACTAA CCAATCAGTT CGCTTCTCGC 5 04 0 

TTCTGTTCGC GCGCTTCTGC TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG 5100 

GGCGCCAGTC CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT AAAC CCTCTT 516 0 

GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG GGTCTCCTCT GAGTGATTGA 5220 

CTACCCGTCA GCGGGGGTCT TTCACCCAGA GTTTGGAACT TACTGTCTTC TTGGGACCTG 5280 

CAGCCCGGGG GATCCACTAG TTCTAGAGCG GCCGCCACCG CGGTGGATTC TGCCTCGCGC 534 0 

GTTTCGGTGA TGACGGTGAA AACCTCTGAC ACATGCAGCT CCCGGAGACG GTCACAGCTT 54 0 0 

GTCTGTAAGC GGATGCCGGG AGCAGACAAG CCCGTCAGGG CGCGTCAGCG GGTGTTGGCG 546 0 

GGTGTCGGGG CGCAGCCATG ACCCAGTCAC GTAGCGATAG CGGAGTGTAT ACTGGCTTAA 5 52 0 

CTATGCGGCA TCAGAGCAGA TTGTACTGAG AGTGCACCAT ATGCGGTGTG AAATACCGCA 5 5BO 

CAGATGCGTA AGGAGAAAAT ACCGCATCAG GCGCTCTTCC GCTTCCTCGC TCACTGACTC 564 0 

GCTGCGCTCG GTCGTTCGGC TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG 5 70 0 

GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG TGAGCAAAAG GCCAGCAAAA 5 76C 

GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 5 82 0 

CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG 588C 

ATACCAGGCG TTTCCCCCTG GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT 594 0 

TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC AATG CTCACG 6 00 0 

CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 6 06 0 
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CCCCGTTCAG CCCGACCGCT GCGCCTTATC 
AAGACACGAC TTATCGCCAC TGGCAGCAGC 
5 TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 

AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 
TTGATCCGGC AAACAAACCA CCGCTGGTAG 

10 

TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 
TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 
15 CACCTAGATC CTTTTAAATT AAAAATGAAG 

AACTTGGTCT GACAGTTACC AATGCTTAAT 
ATTTCGTTCA TCCATAGTTG CCTGACTCCC 

20 

CTTACCATCT GGCCCCAGTG CTGCAATGAT 
TTTATCAGCA ATAAACCAGC CAGCCGGAAG 

2 5 ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 

TAATAGTTTG CGCAACGTTG TTGCCATTGC 
TGGTATGGCT TCATTCAGCT CCGGTTCCCA 

30 

GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 
CGCAGTGTTA TCACTCATGG TTATGGCAGC 

3 5 CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 

GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 
AACTTTAAAA GTG CTCATC A TTGGAAAACG 

40 

ACCG CTGTTG AGATCCAGTT CGATGTAACC 
TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 

4 5 GGGAATAAGG GCGACACGGA AATGTTGAAT 

AAGCATTTAT CAGGGTTATT GTCTCATGAG 
TAAACAAATA GGGGTTCCGC GCACATTTCC 

50 

CATTATTATC ATGACATTAA CCTATAAAAA 



88 

CGGTAACTAT CGTCTTGAGT CCAACCCGGT 6 12 0 

CACTGGTAAC AGGATTAGCA GAGCGAGGTA 6180 

GTGGCCTAAC TACGGCTACA CTAGAAGGAC 6 24 0 

AGTTACCTTC GGAAAAAGAG TTGGTAG CTC 6 3 0C 

CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT 6 360 

TCCTTTGATC TTTTCTACGG GGTCTGACGC 64 2 0 

TTTGGTCATG AG ATT AT C AA AAAGGATCTT 6 4 80 

TTTTAAATCA ATCTAAAGTA TATATGAGTA 6 54C 

CAGTGAGGCA CCTATCTCAG CGATCTGTCT 6 6 CC 

CGTCGTGTAG ATAACTACGA TACGGGAGGG 666 0 

ACCGCGAGAC CCACGCTCAC CGGCTCCAGA 6 72 0 

GGCCGAGCGC AGAAGTGGTC CTGCAACTTT 6 780 

CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT 6 84 0 

TGCAGGCATC GTGGTGTCAC GCTCGTCGTT 6 90C 

ACGATCAAGG CGAGTTACAT GATCCCCCAT 6 96 0 

TCCTCCGATC GTTGTCAGAA GTAAGTTGGC 7 02 0 

ACTGCATAAT TCTCTTACTG TCATGCCATC 7080 

CTCAACCAAG TCATTCTGAG AATAGTGTAT 714 0 

AACACGGGAT AATACCGCGC CACATAGCAG 72 00 

TTCTTCGGGG CGAAAACTCT CAAGGATCTT 72 6 0 

CACTCGTGCA CC CAACTGAT CTTCAGCATC 7 320 

AAAAA CAGGA AGGCAAAATG CCGCAAAAAA 7 3 8 0 

ACTC AT ACT C TTCCTTTTTC AATATTATTG 74 4 0 

CGGATACATA TTTGAATGTA TTTAG AAAAA 7 50 0 

CCGAAAAGTG CCACCTGACG TCTAAGAAAC 7 56 0 

TAGGCGTATC ACGAGGCCCT TTCGTCT 7617 
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(2) INFORMATION FOR SEQ ID NO : 3 5 : 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15581 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(D; TOPOLOGY: linear 

Cn) MOLECULE TYPE: DNA 



dx) FEATURE: 

(A) NAME / KEY : - 

(B) LOCATION: 1.. 15581 

(D) OTHER INFORMATION: /note= "pNLnSGll " 



60 
120 
180 
240 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 3 5 : 

2 0 TGGAAGGGCT AATTTGGTCC CAAAAAAGAC AAGAGATCCT TGATCTGTGG ATCTACCACA 

CACAAGGCTA CTTCCCTG AT TGGCAGAACT ACACACCAGG GCCAGGGATC AGATATCCAC 
^ TGACCTTTGG ATGGTG CTTC AAGTTAGTAC CAGTTGAACC AGAGCAAGTA GAAGAGGCCA 

AATAAGGAGA GAAGAACAGC TTGTTACACC CTATG AGCCA GCATGGGATG GAGGACCCGG 
AGGGAGAAGT ATTAGTGTGG AAGTTTGACA GCCTCCTAGC ATTTCGTCAC ATGGCCCGAG 3 00 

3 0 AGCTGCATCC GGAGTACTAC AAAGACTGCT GACATCGAGC TTTCTACAAG GGACTTTCCG 36 0 

CTGGGGACTT TCCAGGGAGG TGTGGCCTGG GCGGGACTGG GGAGTGGCGA GCCCTCAGAT 42 0 

GCTACATATA AG C AG C TG CT TTTTGCCTGT ACTGGGTCTC TCTGGTTAGA CCAGATCTGA 
GCCTGGGAGC TCTCTGGCTA ACTAGGGAAC CCACTGCTTA AGCCTCAATA AAGCTTGCCT 
TGAGTGCTCA AAGTAGTGTG TGCCCGTCTG TTGTGTGACT CTGGTAACTA GAGATCCCTC 6 00 

AGACCCTTTT AG TC AG TG TG GAAAATCTCT AGCAGTGGCG CCCGAACAGG GACTTGAAAG 
CGAAAGTAAA GCCAGAGGAG ATCTCTCGAC GCAGGACTCG GCTTGCTGAA GCGCGCACGG 
CAAGAGGCGA GGGGCGGCGA CTGGTGAGTA CGCCAAAAAT TTTGACTAGC GGAGGCTAGA 
AGGAGAGAGA TGGGTGCGAG AGCGTCGGTA TTAAGCGGGG GAGAATTAGA TAAATGGGAA 
AAAATTCGGT TAAGGCCAGG GGGAAAGAAA CAATATAAAC TAAAACATAT AGTATGGGCA 

5 0 AG CAGGG AG C TAGAACGATT CGCAGTTAAT CCTGGCCTTT TAGAGACATC AGAAGGCTGT 96 0 

AGACAAATAC TGGGACAGCT ACAACCATCC CTTCAGACAG GATCAGAAGA ACTTAGATCA 102 0 

^ TTATATAATA CAATAG CAGT CCTCTATTGT GTGCATCAAA GGATAGATGT AAAAGACACC 108 0 

AAGGAAGCCT TAGATAAGAT AGAGGAAGAG CAAAACAAAA GTAAGAAAAA GGCACAGCAA 114 0 

GCAGCAGCTG ACACAGGAAA CAACAGCCAG G T C AG C C AAA ATTACCCTAT AGTGCAGAAC 12 0 0 

6 0 CTCCAGGGGC AAATGGTACA TCAGGCCATA TCACCTAGAA CTTTAAATGC ATGGGTAAAA 126 0 

GTAGTAGAAG AGAAGGCTTT CAGCCCAGAA GTAATACCCA TGTTTTCAGC ATTATCAGAA 13 2 0 

^ GGAGCCACCC CACAAGATTT AAATACCATG CTAAACACAG TGGGGGGACA TCAAGCAGCC 1380 

ATGCAAATGT TAAAAGAGAC CATCAATGAG GAAGCTGCAG AATGGGATAG ATTGCATCCA 144C 
GTGCATGCAG GGCCTATTGC ACCAGGCCAG ATGAGAGAAC CAAGGGGAAG TGACATAGCA 



480 
540 



660 
720 
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840 
900 
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GGAACTACTA GTACCCTTCA GGAACAAATA 
GTAGGAGAAA TCTATAAAAG ATGGATAATC 
5 AGCCCTACCA GCATTCTGGA CATAAGACAA 

GACCGATTCT ATAAAACTCT AAGAGCCGAG 
ACAGAAACCT TGTTGGTCCA AAATGCGAAC 

10 

GGACCAGGAG CGACACTAGA AGAAATGATG 
CATAAAGCAA GAGTTTTGGC TGAAGCAATG 
15 ATACAGAAAG GCAATTTTAG GAACCAAAGA 

GAAGGGCACA TAGCCAAAAA TTGCAGGGCC 
AAGGAAGGAC ACCAAATGAA AGATTGTACT 

20 

TGGCCTTCCC ACAAGGGAAG GCCAGGGAAT 
CCACCAGAAG AGAGCTTCAG GTTTGGGGAA 

2 5 C CGATAG AC A AGGAACTGTA TCCTTTAGCT 

TCGTCACAAT AAAGATAGGG GGGCAATTAA 
ATACAGTATT AGAAGAAATG AATTTGCCAG 

30 

TTGGAGGTTT TATCAAAGTA GGACAGTATG 
AAGCTATAGG TACAGTATTA GTAGGACCTA 

3 5 TGACTCAGAT TGGCTGCACT TTAAATTTTC 

AATTAAAGCC AGGAATGGAT GGCCCAAAAG 
TAAAAGCATT AGTAGAAATT TGTACAGAAA 

40 

GGCCTGAAAA TCCATACAAT ACTCCAGTAT 
GGAGAAAATT AGTAGATTTC AGAGAACTTA 

4 5 AATTAGGAAT ACCACATCCT GCAGGGTTAA 

TGGGCGATGC ATATTTTTCA GTTCCCTTAG 
CCATACCTAG TATAAACAAT GAGACACCAG 

50 

AGGGATGGAA AGGATCACCA GCAATATTCC 
TTAGAAAACA AAATCCAGAC ATAGTCATCT 

5 5 CTGACTTAGA AATAGGGCAG CATAGAACAA 

GGTGGGGATT TACCACACCA GACAAAAAAC 
GTTATGAACT CCATCCTGAT AAATGGACAG 

60 

GCTGGACTGT CAATGACATA CAGAAATTAG 
ATGCAGGGAT TAAAGTAAGG CAATTATGTA 

6 5 AAGTAGTACC ACTAACAGAA GAAGCAGAGC 

AAGAACCGGT ACATGGAGTG TATTATGACC 
AGCAGGGGCA AGGCCAATGG AC AT AT C AAA 



90 

GGATGGATGA CACATAATCC ACCTATCCCA 156 0 

CTGGGATTAA ATAAAATAGT AAGAATGTAT 162 0 

GGACCAAAGG AACCCTTTAG AGACTATGTA 16 8 0 

CAAGCTTCAC AAGAGGTAAA AAATTGGATG 174 0 

CCAGATTGTA AGACTATTTT AAAAGCATTG 18 00 

ACAGCATGTC AGGGAGTGGG GGGACCCGGC 1B60 

AGCCAAGTAA CAAATCCAGC TACCATAATG 192 0 

AAGACTGTTA AGTGTTTCAA TTGTGGCAAA 198 0 

CCTAGGAAAA AGGGCTGTTG GAAATGTGGA 2 04 0 

GAGAGACAGG CTAATTTTTT AGGGAAGATC 2100 

TTTCTTCAGA GCAGACCAGA GCCAACAGCC 216 0 

GAGACAACAA CTCCCTCTCA GAAGCAGGAG 2 22 0 

TCCCTCAGAT CACTCTTTGG CAGCGACCCC 22 8 0 

AGGAAGCTCT ATTAGATACA GGAGCAGATG 23 4 0 

GAAGATGGAA ACCAAAAATG ATAGGGGGAA 24 00 

ATCAGATACT CATAGAAATC TGCGGACATA 24 6 0 

CACCTGTCAA CATAATTGGA AGAAATCTGT 2 52 0 

CCATTAGTCC TATTGAGACT GTACCAGTAA 25 80 

TT AAA C AA TG GCCATTGACA GAAGAAAAAA 264 0 

TGGAAAAGGA AGGAAAAATT TCAAAAATTG 2 7 00 

TTG C CAT AAA GAAAAAAGAC AGTACTAAAT 2 76 0 

ATAAGAGAAC TCAAGATTTC TGGGAAGTTC 282 0 

AACAGAAAAA ATCAGTAACA GTACTGGATG 2 8 80 

ATAAAGACTT CAGGAAGTAT ACTGCATTTA 2 94 0 

GG ATT AG AT A TCAGTACAAT GTGCTTCCAC 3 0 00 

AGTGTAGCAT GACAAAAATC TTAGAGCCTT 3 06 0 

ATCAATACAT GGATGATTTG TATGTAGGAT 3 12 0 

AAATAGAGGA ACTGAGACAA CATCTGTTGA 31 BO 

ATCAGAAAGA ACCTCCATTC CTTTGGATGG 324 0 

T AC AG CCTAT AGTGCTGCCA GAAAAGGACA 3 3 00 

TGGGAAAATT GAATTGGGCA AGTCAGATTT 3 360 

AACTTCTTAG GGGAACCAAA G C ACTAAC AG 34 2 0 

TAGAACTGGC AGAAAACAGG GAGATTCTAA 34 80 

CATCAAAAGA CTTAATAGCA GAAATACAGA 3 54C 

TTTATCAAGA GCCATTTAAA AATCTGAAAA 36 00 
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CAGGAAAATA TGCAAGAATG AAGGGTGCCC ACACTAATGA TGTGAAACAA TTAACAGAGG 3 66 0 

CAGTACAAAA AATAG C CAC A GAAAGCATAG TAATATGGGG AAAGACTCCT AAATTTAAAT 3 72 0 

TACCCATACA AAAGGAAACA TGGG AAG CAT GGTGGACAGA GTATTGGCAA GCCACCTGGA 3 7 80 

TTCCTGAGTG GGAGTTTGTC AATACCCCTC CCTTAGTGAA GTTATGGTAC CAGTTAGAGA 38 4 0 

AAGAACCCAT AATAGGAGCA GAAACTTTCT ATGTAGATGG GGCAGCCAAT AG GG AAACTA 3 90 0 

AATTAGGAAA AGCAGGATAT GTAACTGACA GAGGAAGACA AAAAGTTGTC CCCCTAACGG 3 96 0 

ACACAACAAA TCAGAAGACT GAGTTACAAG CAATTCATCT AGCTTTGCAG GATTCGGGAT 4 02 0 

TAGAAGTAAA CATAGTGACA GACTCACAAT ATGCATTGGG AATCATTCAA GCACAACCAG 40 BO 

ATAAGAGTGA ATCAGAGTTA GTCAGTCAAA T AATAG AG C A GTTAATAAAA AAGGAAAAAG 414 0 

TCTACCTGGC ATGGGTACCA GCACACAAAG GAATTGGAGG AAATGAACAA GTAGATGGGT 4200 

TGGTCAGTGC TGGAATCAGG AAAGTACT7.T TTTTAGATGG AATAGATAAG GCCCAAGAAG 4 26 0 

AACATCAGAA ATATCACAGT AATTGGAGAG CAATGGCTAG TGATTTTAAC CTACCACCTG 4 320 

TAGTAGCAAA AGAAA7AGTA GCCAGCTGTG ATAAATGTCA GCTAAAAGGG GAAGCCATGC 4 3 80 

ATGGACAACT ACACTGTAGC CCAGGAATAT GGC AG CT AG A TTGTACACAT TTAGAAGGAA 444 0 

AAG TTATCTT GGTAGCAGTT CATGTAGCCA G TG G A TAT AT AGAAGCAGAA GTAATTCCAG 4 500 

CAGAGACAGG CCAACAAACA GCATACTTCC TCTTAAAATT AGCAGGAAGA TGGCCAGTAA 4 56 0 

AAACAGTAGA TACAGACAAT GGCAGCAATT TCACCAGTAC TACAGTTAAG GCCGCCTGTT 4 62 0 

GGTGGGCGGG GATCAAGCAG GAATTTGGCA TTCCCTACAA TCCCCAAAGT CAAGGAGTAA 4 68 0 

TAGAATCTAT GAATAAAGAA TTAAAGAAAA TTATAGGACA G G T AAG AG AT CAGGCTGAAC 4 74 0 

ATCTTAAGAC AGCAGTACAA ATGGCAGTAT TCATCCACAA TTTTAAAAGA AAAGGGGGGA 4 8 00 

TTGGGGGGTA CAGTGCAGGG GAAAG AATAG TAGACATAAT AGCAACAGAC ATACAAACTA 4 86 0 

AAGAATTACA AAAACAAATT ACAAAAATTC AAAATTTTCG GGTTTATTAC AGGGACAGCA 4 92 0 

GAGATCCAGT TTCGAAACGA CCAGCAAAGC TCCTCTGGAA AGGTGAAGGG GCAGTAGTAA 4 98 0 

TACAAGATAA TAGTGACATA AAAGTAGTGC CAAGAAGAAA AGCAAAGATC ATCAGGGATT 5 04 0 

ATGGAAAACA GATGGCAGGT GATGATTGTG TGGCAAGTAG ACAGGATGAG GATTAACACA 5100 

TGGAAAAGAT TAGTAAAACA CCATATGTAT ATTTCAAGGA AAGCTAAGGA CTGGTTTTAT 516 0 

AGACATCACT ATGAAAGTAC TAATCCAAAA ATAAGTTCAG AAGTACACAT CCCACTAGGG 5 22 0 

GATGCTAAAT TAGTAATAAC AACATATTGG GGTCTGCATA CAGGAGAAAG AGACTGGCAT 52 8 0 

TTGGGTCAGG GAGTCTCCAT AGAATGGAGG AAAAAGAGAT ATAGCACACA AGTAGACCCT 5 340 

G AC CT AG C A G ACCAACTAAT TCATCTGCAC TATTTTGATT G TTTTTC AG A ATCTGCTATA 54 00 

AGAAATACCA TATTAGGACG TAT AG TT AG T CCTAGGTGTG AATATCAAGC AGGACATAAC 54 6 0 

AAGGTAGGAT CTCTACAGTA CTTGG C ACT A GCAGCATTAA TAAAACCAAA ACAGATAAAG 5 52 0 

CCACCTTTGC CTAGTGTTAG GAAACTGACA GAGGACAGAT GG AACAAGCC C C AG AAG AC C 5 58 0 

AAGGGCCACA GAGGGAGCCA TACAATGAAT GGACACTAGA GCTTTTAGAG GAACTTAAGA 564 0 

GTGAAGCTGT TAGACATTTT CCTAGGATAT GGCTCCATAA CTTAGGACAA CATATCTATG 5700 
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AAACTTACGG GG AT ACTTGG GCAGGAGTGG 
TGTTTATCCA TTTCAGAATT GGGTGTCGAC 
5 GAGCAAGAAA TGGAGCCAGT AG AT OCT AG A 

CCTAAAACTG CTTGTACCAA TTGCTATTGT 
TTCATGACAA AAGCCTTAGG CATCTCCTAT 
GCTCATCAGA ACAGTCAGAC TCATCAAGCT 
ATGCAACCTA TAATAGTAGC AATAGTAGCA 
15 GTGTGGTCCA TAGTAATCAT AGAATATAGG 

TTAATTGATA GACTAATAGA AAGAGCAGAA 
TCAGCACTTG TGGAGATGGG GGTGGAAATG 

20 

CTGTAGTGCT ACAGAAAAAT TGTGGGTCAC 
AGCAACCACC ACTCTATTTT GTGCATCAGA 

2 5 TGTTTGGGCC ACACATG CCT GTGTACCCAC 

AAATGTGACA GAAAATTTTA ACATGTGGAA 
TATAATCAGT TTATGGGATC AAAGCCTAAA 

30 

TAGTTTAAAG TGCACTGATT TGAAGAATGA 
GATAATGGAG AAAGGAGAGA TAAAAAACTG 

3 5 TAAGGTG CAG AAAGAATATG CATTCTTTTA 

CAGCTATAGG TTGATAAGTT GTAACACCTC 
CTTTGAGCCA ATTCCCATAC ATTATTGTG C 

40 

TAATAAGACG TTCAATGGAA CAGGACCATG 
TGGAATCAGG CCAGTAGTAT CAACTCAACT 

4 5 TGTAGTAATT AGATCTCCCA ATTTCACAGA 

CACATCTGTA GAAATTAATT GTACAAGACC 
CCAGAGGGGA C C AGGG AG AG CATTTGTTAC 

50 

ACATTGTAAC ATTAGTAGAG CAAAATGGAA 
AAGAGAACAA TTTGGAAATA ATAAAACAAT 

5 5 AGAAATTGTA ACGCACAGTT TTAATTGTGG 

ACTGTTTAAT AGTACTTGGT TTAATAGTAC 
AGGAAGTGAC ACAATCACAC TCCCATGCAG 

60 

AGTAGGAAAA GCAATGTATG CCCCTCCCAT 
TACTGGGCTG CTATTAACAA GAGATGGTGG 

6 5 ACCTGGAGGA GGCGATATGA GGGACAATTG 

AAAAATTGAA CCATTAGGAG TAGCACCCAC 
AAAAAG AG C A GTGGG AATAG GAGCTTTGTT 



9 2 

AAGCCATAAT AAGAATTCTG CAACAACTGC 576 C 

ATAGCAGAAT AGGCGTTACT CGACAGAGGA 5 82 C 

CTAGAGCCCT GGAAGCATCC AGGAAGTCAG 588 0 

AAAAAGTGTT GCTTTCATTG CCAAGTTTGT 5 94 0 

GGCAGG AAGA AGCGGAGACA GCGACGAAGA 6 000 

TCTCTATCAA AGCAGTAAGT AGT A CATGTA 6 06 0 

TTAGTAGTAG CAATAATAAT AGCAATAGTT 6 12 0 

AAAATATTAA GACAAAGAAA AATAG AC AG G 6180 

GACAGTGGCA ATGAGAGTGA AGGAGAAGTA 6 24 0 

GGGCACCATG CTCCTTGGGA TATTGATGAT 6 3 CO 

AGTCTATTAT GGGGTACCTG TGTGGAAGGA 6 36 0 

TGCTAAAGCA TATGATACAG AGGTACATAA 64 2 0 

AGACCCCAAC CCACAAGAAG TAGTATTGGT 64 8 0 

AAATGACATG GTAGAACAGA TGCATGAGGA 6 54 0 

GCCATGTGTA AAATTAACCC CACTCTGTGT 66 00 

TACTAATACC AATAG TAGTA GCGGGAGAAT 6 66 0 

CTCTTTCAAT ATCAGCACAA GCATAAGAGA 6 72 0 

TAAACTTGAT AT AGT AC C AA TAGATAATAC 6 78 0 

AGTCATTACA CAGGCCTGTC CAAAGGTATC 684 C 

CCCGGCTGGT TTTGCGATTC TAAAATGTAA 6 900 

TACAAATGTC AGCACAGTAC AATGTACACA 6 96 0 

GCTGTTAAAT GGCAGTCTAG CAGAAGAAGA 7 02 0 

CAATGCTAAA ACCAT AATAG TACAGCTGAA 7 08 0 

CAACAACAAT ACAAGAAAAA GTATCCGTAT 7 14 0 

AATAG GAAAA ATAGGAAATA TGAGACAAGC 720C 

TGCCACTTTA AAACAGATAG CTAGCAAATT 72 6 0 

AATCTTTAAG CAATCCTCAG GAGGGGACCC 7 32C 

AGGGGAATTT TTCTACTGTA ATTCAACACA 738C 

TTGGAGTACT GAAGGGTCAA ATAACACTGA 744 0 

AATAAAACAA TTTATAAACA TGTGGCAGGA 7500 

CAGTGGACAA ATTAGATGTT CATCAAATAT 7 56 0 

TAATAACAAC AATGGGTCCG AGATCTTCAG 762 C 

GAGAAGTGAA TTATATAAAT ATAAAGTAGT 76 60 

CAAGG CAAAG AGAAGAGTGG TGCAGAGAGA 7 74 0 

CCTTGGGTTC TTGGGAGCAG CAGGAAGCAC 7 8 00 
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TATGGGCGCA GCGTCAATGA CGCTGACGGT ACAGGCCAGA CAATTATTGT CTGATATAGT 786 0 

GCAGCAGCAG AACAATTTGC TGAGGGCTAT TGAGGCGCAA CAGCATCTGT TGCAACTCAC 7 92 0 

AGTCTGGGGC ATCAAACAGC TCCAGGCAAG AATCCTGGCT GTGGAAAGAT ACCTAAAGGA 7 98 0 

TCAACAGCTC CTGGGGATTT GGGGTTGCTC TGGAAAACTC ATTTGCACCA CTGCTGTGCC 8 04 0 

TTGGAATGCT AGTTGGAGTA ATAAATCTCT GGAACAGATT TGGAATAACA TGACCTGGAT 8 IOC 

GGAGTGGGAC AGAGAAATTA ACAATTACAC AAGCTTAATA CACTCCTTAA TTGAAGAATC 816 0 

GCAAAACCAG CAAGAAAAGA ATGAACAAGA ATTATTGGAA TTAGATAAAT GGGCAAGTTT 922 0 

GTGGAATTGG TTTAACATAA CAAATTGGCT GTGG TAT AT A AAATTATT C A TAATGATAGT 82 80 

AGG AGGCTTG GTAGGTTTAA GAATAGTTTT TGCTGTACTT TCTATAGTGA ATAGAGTTAG 8 34 0 

GCAGGGATAT TCACCATTAT CGTTTCAGAC CCACCTCCCA ATCCCGAGGG GACCCGACAG 8 4 00 

GCCCGAAGGA ATAGAAGAAG AAGGTGGAGA GAGAGACAGA GACAGATCCA TTCGATTAGT 84 6 0 

GAACGGATCC TTAGCACTTA TCTGGGACGA TCTGCGGAGC CTGTGCCTCT TCAGCTACCA 8 52 0 

CCGCTTGAGA GACTTACTCT TGATTGTAAC GAGGATTGTG GAACTTCTGG GACGCAGGGG 8 58 0 

GTGGGAAGCC CTCAAATATT GGTGGAATCT CCTACAGTAT TGGAGTCAGG AACTAAAGAA 864 0 

TAGTG CTGTT AACTTG CTC A ATGCCACAGC CATAGCAGTA GCTGAGGGGA CAGATAGGGT 8700 

TATAGAAGTA TTACAAGCAG CTT AT AG AG C TATTCGCCAC ATACCTAGAA GAATAAGACA 8 76 0 

GGGCTTGGAA AGGATTTTGC TATAAGATGG GTGGCAAGTG GTCAAAAAGT AGTGTGATTG 8 82 0 

GATGGCCTGC TGTAAGGGAA AGAATGAGAC GAGCTGAGCA AGAAATGGCT AG C AAAGG AG 8880 

AAGAACTCTT CACTGGAGTT GTCCCAATTC TTGTTGAATT AGATGGTGAT GTTAACGGCC 8 94 0 

ACAAGTTCTC TGTCAGTGGA GAGGGTGAAG GTGATGCAAC ATACGGAAAA CTTACCCTGA 90 0 0 

AGTTCATCTG CACTA CTGGC AAACTGCCTG TTCCATGGCC AACACTTGTC ACTACTCTCT 906 0 

CTTATGGTGT TCAATGCTTT TCAAGATACC CGGATCATAT GAAACGGCAT GACTTTTTCA 9120 

AGAGTG CCAT GCCCGAAGGT TATGTACAGG AAAGGACCAT CTTCTTCAAA GATGACGGCA 918 0 

ACTACAAGAC ACGTGCTGAA GTCAAGTTTG AAGGTGATAC CCTTGTTAAT AGAATCGAGT 924 0 

T AAAAGG TAT TGACTTCAAG GAAGATGGCA ACATTCTGGG ACACAAATTG GAATACAACT 9300 

ATAACTCACA CAATGTATAC ATCATGGCAG ACAAACAAAA GAATGGAATC AAAGTGAACT 93 6 0 

TCAAGACCCG CCACAACATT GAAGATGGAA GCGTTCAACT AGCAGACCAT TATCAACAAA 94 2 0 

ATACTCCAAT TGGCGATGGC CCTGTCCTTT TACCAGACAA CCATTACCTG TCCACACAAT 94 BO 

CTGCCCTTTC GAAAGATCCC AACGAAAAGA GAGACCACAT GGTCCTTCTT GAGTTTGTAA 9 54 0 

CAGCTGCTGG GATTACACAT GGCATGGATG AACTGTACAA CGGACTCGAG ACCTAGAAAA 96 0 0 

ACATGGAG C A ATCACAAGTA GCAATACAGC AGCTAACAAT GCTGCTTGTG CCTGGCTAGA 966 C 

AGCACAAGAG GAGGAAGAGG TGGGTTTTCC AGTCACACCT CAGGTACCTT TAAGACCAAT 972 0 

GACTTACAAG GCAGCTGZAG ATCTTAGCCA CTTTTTAAAA GAAAAGGGGG GACTGGAAGG 978 0 

GCTAATTCAC TCCCAAAGAA GACAAGATAT CCTTGATCTG TGGATCTACC ACACACAAGG 984 0 

CTACTTCCCT GATTGGCAGA ACTACACACC AGGGCCAGGG GTCAGATATC CACTGACCTT ^900 
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TGGATGGTGC TACAAGCTAG TACCAGTTGA GCCAGATAAG GTAGAAGAGG CCAATAAAGG 9 96 0 

AGAGAACACC AGCTTGTTAC ACCCTGTGAG CCTGCATGGA ATGGATGACC CTGAGAGAGA 1 002 j 

AGTG7TAGAG TGGAGGTTTG ACAGCCGCCT AGCATTTCAT CACGTGGCCC GAGAGCTGCA 1008 0 

TCCGGAGTAC TTCAAGAACT GCTGACATCG AGCTTGCTAC AAGGGACTTT CCGCTGGGGA 1014 0 

CTTTCCAGGG AGGCGTGGCC TGGGCGGGAC TGGGGAGTGG CGAGCCCTCA GATGCTGCAT 10200 

ATAAGCAGCT G CTTTTTGCC TGTACTGGGT CTCTCTGGTT AGACCAGATC TGAGCCTGGG 1026 0 

AGCTCTCTGG CTAACTAGGG AACCCACTGC TTAAGCCTCA ATAAAGCTTG CCTTGAGTGC 10320 

15 TTCAAGTAGT GTGTGCCCGT CTGTTGTGTG ACTCTGGTAA CTAGAGATCC CTCAGACCCT 103 BO 

TTTAGTCAGT GTGGAAAATC TCTAGCACCC CCCAGGAGGT AGAGGTTGCA GTGAGCCAAG 1044 0 

ATCGCGCCAC TGCATTCCAG CCTGGGCAAG AAAACAAG A C TGTCTAAAAT AATAATAATA 10 500 

AGTTAAGGGT ATTAAATATA TTTATACATG GAGGTCATAA AAATATATAT ATTTGGGCTG 10 560 

GGCGCAGTGG CTCACACCTG CGCCCGGCCC TTTGGGAGGC CGAGGCAGGT GGATCACCTG 1062 0 

AGTTTGGGAG TTCCAGACGA GCCTGACCAA CATGGAGAAA CCCCTTCTCT GTGTATTTTT 10680 

AGTAGATTTT ATTTTATGTG TATTTTATTC ACAGGTATTT CTGGAAAACT GAAACTGTTT 1074 0 

TTCCTCTACT ~ GATACCAC AAGAATCATC AGCACAGAGG AAGACTTCTG TGATCAAATG 10800 

TGGTGGGAGA G3GAGGTTTT CACCAGCACA TGAGCAGTCA GTTCTGC CGC AGACTCGGCG 1086 0 

GGTGTCCTTC GGTTCAGTTC CAACACCGCC TGCCTGGAGA GAGGTCAGAC CACAGGGTGA 10 920 

3 5 GGGCTCAGTC CCCAAGACAT AAACACCCAA GACATAAACA CCCAACAGGT CCACCCCGCC 10980 

TGCTGCCCAG G C AG AG C CG A TTCACCAAGA CGGGAATTAG GATAGAGAAA GAGTAAGTCA 11040 

CACAGAGCCG GCTGTGCGGG AGAACGGAGT TCTATTATGA CTCAAATCAG TCTCCCCAAG 11100 

40 

CATTCGGGGA TCAGAG7TTT TAAGGATAAC TTAGTGTGTA GGGGGCCAGT GAGTTGGAGA 11160 

TGAAAGCGTA GGGAGTCGAA GGTGTCCTTT TGCG CCGAGT CAGTTCCTGG GTGGGGGCCA 1122 0 

4 5 CAAGATCGGA TGAGCCAGTT TATCAATCCG GGGGTGCCAG CTGATCCATG GAGTGCAGGG 11280 

TCTG CAAAAT ATCTCAAGCA CTGATTGATC TTAGGTTTTA CAATAGTGAT GTTACCCCAG 1134 0 

GAACAATTTG GGGAAGGTCA GAATCTTGTA GCCTGTAGCT GCATGACTCC TAAACCATAA 114 0 0 

50 

TTTCTTTTTT GTTTTTTTTT TTTTATTTTT GAGACAGGGT CTCACTCTGT CACCTAGGCT 114 6 0 

GGAGTGCAGT GGTGCAATCA CAGCTCACTG CAGCCTCAAC GTCGTAAGCT CAAGCGATCC 11520 

5 5 TCCCACCTCA GCCTGCCTGG TAGCTGAGAC TACAAGCGAC GCCCCAGTTA ATTTTTGTAT 11580 

TTTTGGTAGA GGCAGCGTTT TGCCGTGTGG CCCTGGCTGG TCTCGAACTC CTGGGCTCAA 1164 0 

GTGATCC AG C CTCAGCCTCC CAAAGTGCTG GGACAACCGG GGCCAGTCAC TGCACCTGGC 11700 

60 

CCTAAACCAT AATTTCTAAT CTTTTGGCTA ATTTGTTAGT CCTACAAAGG CAGTCTAGTC 11760 

CCCAGGCAAA AAGGGGGTTT GTTTCGGGAA AGGGCTGTTA CTGTCTTTGT TTCAAACTAT 1182 0 

6 5 AAACTAAGTT CCTCCTAAAC TTAGTTCGGC CTACACCCAG GAATGAACAA GGAGAGCTTG 1188C 

GAGGTTAGAA GCACGATGGA ATTGGTTAGG TCAGATCTCT TTCACTGTCT GAGTTATAAT 11 94 0 

TTTGCAATGG TGGTTCAAAG ACTGCCCGCT TCTGACACCA GTCGCTGCAT TAATGAATCG 12 000 
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GCCAACGCGC GGGGAGAGGC GGTTTGCGTA TTGGCGCTCT TCCGCTTCCT CGCTCACTGA 12 06 0 

CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT 12 120 

ACGGTTATCC ACAGAATCAG GGG AT AACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA 12180 

AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCC 1224 0 

TGACGAGC AT CACAAAAATC GACGCTCAAG TCAGAGGTGG CGAAACCCGA CAGGACTATA 12 3 00 

AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC -CTCCTGTTC CGACCCTGCC 12360 

G CTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC GTGGCGCTTT CTCAATGCTC 124 2 0 

ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA 124 80 

ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC TATCGTCTTG AGTCCAACCC 12 54 0 

GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT AACAGGATTA G CAG AGCGAG 126 00 

GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT AACTACGGCT ACACTAGAAG 12 66 0 

GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGTTGGTAG 12 720 

CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA 127 8 0 

GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG ATCTTTTCTA CGGGGTCTGA 1284 0 

CGCTC AGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC ATGAGATTAT CAAAAAGGAT 12 900 

CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA TCAATCTAAA GTATATATGA 12 96 0 

GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG GCAC CTATCT CAGCGATCTG 13 020 

3 5 TCTATTTCGT TCATCCATAG TTGCCTGACT CCCCGTCGTG TAGATAACTA CGATACGGGA 13 080 

GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA GACCCACGCT CACCGGCTCC 1314 0 

AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG CGCAGAAGTG GTCCTGCAAC 13200 

4 U 

TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC 13260 

AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC ATCGTGGTGT CACGCTCGTC 13320 

15 GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA AGGCGAGTTA CATGATCCCC 13 3 8C 

CATGTTCTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT 1344 0 

GGCCGCAGTG TTATCACTCA TGGTTATGGC A G C A CTG CAT AATTCTCTTA CTGTCATGCC 13 500 

ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC AAGTCATTCT GAGAATAGTG 13 560 

TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG GATAATACCG CG CC AC AT AG 13620 

5 5 CAGAACTTTA AAAGTGCTCA TCATTG GAAA ACGTTCTTCG GGGCGAAAAC TCTCAAGGAT 13680 

CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT GCACCCAACT GATCTTCAGC 13 740 

ATCTTTTACT TTCACCAGCG TTTCTGGGTG AG C AAAAA C A GGAAGGCAAA A TGCCGC AAA 13 8 00 

AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA CTCTTCCTTT TTCAATATTA 13 860 

TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC ATATTTGAAT GTATTTAGAA 13 920 

AAA T AAA C AA ATAGGGGTTC CSCGCACATT TCCCCGAAAA GTGCCACCTG ACGTCTAAGA 13 980 

AACCATTATT ATCATGACAT T AA C C T A T AA AAATAGGCGT ATCACGAGGC CCTTTCGTCT 14 040 

TCAAGAACTG CCTCGCGCGT 7TCGGTGATG ACGGTGAAAA CCTCTGACAC ATGCAGCTCC 1410 0 
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CGGAG ACGGT 


CACAGCTTGT 


CTGTAAGCGG 


ATGCCGGGAG 


CAGACAAGCC 


CGTCAGGGCG 


14 16 0 




CGTCAGCGGG 


TGTTGGCGGG 


TGTCGGGGCG 


CAGCCATGAC 


CCAGTCACGT 


AGCGATAGCG 


14220 




GAGTGTACTG 


GCTTAACTAT 


GCGGCATCAG 


AG C AG ATTGT 


ACTGAGAGTG 


CACCATATGC 


14280 




GGTGTGAAAT 


ACCGCACAGA 


TG CGTAAGG A 


GAAAATACCG 


CATCAGGCGC 


C ATTCG C CAT 


14 340 


10 


TCAGGCTGCG 


CAACTGTTGG 


GAAGGGCGAT 


CGGTGCGGGC 


CTCTTCGCTA 


TTACGCCAGC 


144CC 




GCGGGGAGGC 


AGAGATTGCA 


GTAAGCTGAG 


ATCGCAGCAC 


TG CACT C C AG 


CCTGGGCGAC 


14460 




AGAGTAAGAC 


TCTGTCTCAA 


AAATAAAATA 


AATAAATCAA 


TCAGATATTC 


CAATCTTTTC 


14520 


1 5 


CTTTATTTAT 


TTATTTATTT 


TCTATTTTGG 


AAACACAGTC 


CTTCCTTATT 


CCAGAATTAC 


14S80 




ACATATATTC 


TATTTTTCTT 


TATATGCTCC 


AGTTTTTTTT 


AGACCTTCAC 


CTGAAATGTG 


14640 


20 


TG TAT AC AAA 


ATCTAGGCCA 


GTCCAGCAGA 


GCCTAAAGGT 


AAAAAATAAA 


ATAATAAAAA 


147C0 


ATAAATAAAA 


TCTAGCTCAC 


TCCTTCACAT 


CAAAATGGAG 


ATACAGCTGT 


TAG C ATT AAA 


14760 




TACCAAATAA 


CCCATCTTGT 


CCTCAATAAT 


TTTAAGCGCC 


TCTCTCCACC 


ACATCTAACT 


14820 


2 5 


CCTGTCAAAG 


GCATGTGCCC 


CTTCCGGGCG 


CTCTGCTGTG 


CTGCCAACCA 


ACTGGCATGT 


1488C 




GGACTCTGCA 


GGGTCCCT AA 


CTGCCAAGCC 


CCACAGTGTG 


CCCTGAGGCT 


GCCCCTTCCT 


14940 


30 


TCTAGCGGCT 


GCCCCCACTC 


GGCTTTGCTT 


TCCCTAGTTT 


CAGTTACTTG 


CGTTCAGCCA 


15000 


AGGTCTGAAA 


CTAGGTGCGC 


ACAGAGCGGT 


AAGACTGCGA 


GAGAAAGAGA 


CCAGCTTTAC 


15O60 




AGGGGGTTTA 


T CACAGTGC A 


CCCTGACAGT 


CGTCAGCCTC 


ACAGGGGGTT 


TATCACATTG 


15120 


3 5 


CACCCTGACA 


GTCGTCAGCC 


TCACAGGGGG 


TTTATCACAG 


TGCACCCTTA 


CAATCATTCC 


15180 




ATTTG ATT C A 


CAATTTTTTT 


AGTCTCTACT 


GTGCCTAACT 


TGTAAGTTAA 


ATTTGATCAG 


15240 


40 


AGGTGTGTTC 


CCAGAGGGGA 


AAACAGTATA 


TACAGGGTTC 


AGTACTATCG 


CATTTCAGGC 


15300 


CTCCACCTGG 


GTCTTGGAAT 


GTGTCCCCCG 


AGGGGTGATG 


ACTACCTCAG 


TTGGATCTCC 


15360 




ACAGGTCACA 


GTGACACAAG 


ATAACCAAGA 


CACCTCCCAA 


GGCTACCACA 


ATGGGCCGCC 


1S420 


45 


CTCCACGTGC 


ACATGGCCGG 


AGGAACTGCC 


ATGTCGGAGG 


TGCAAGCACA 


CCTGCGCATC 


15480 




AGAGTCCTTG 


GTGTGGAGGG 


AGGGACCAGC 


GCAGCTTCCA 


GCCATCCACC 


TGATGAACAG 


15540 


c r\ 


AACCTAGGGA 


AAGCCCCAGT 


TCTACTTACA 


CCAGGAAAGG 


C 




15581 
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(2) INFORMATION FOR SEQ ID NO : 3 6 : 

(l) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 74 base pairs 
5 (B) TYPE: nucleic acid 

CO STRANDEDNESS : single 
(D) TOPOLOGY : linear 

(li) MOLECULE TYPE: DNA 

10 

FEATURE : 

(A) NAME /KEY : - 

(B) LOCATION : 1 . . 74 

(D) OTHER INFORMATION: /note- "primer #17982" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GGGGCGTACG GAGCGCTCCG AATTCGGTAC CGTTTAAACG GGCCCTCTCG AGTCCGTTGT 
ACAGTTCATC CATG 



(IX) 



15 



(2) INFORMATION FOR SEQ ID NO : 3 7 : 

SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
ED) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA 



FEATURE : 

(A) NAME/ KEY : - 

(B) LOCATION: 1 . .66 

<D> OTHER INFORMATION: /note- "primer #17983" 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 3 7 : 
GGGGGAATTC GCGCGCGTAC GTAAGCGCTA GCTGAGCAAG AAATGGCTAG CAAAGGAGAA 
GAACTC 



(l) 



30 



35 



40 



(ix) 
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WHAT IS CLAIMED IS 



1. An isolated nucleic acid that encodes an 
engineered Aequorea Victoria fluorescent protein, wherein the 
protein encoded by the isolated nucleic acid is selected from 



4 the group that consists of: 

5 a. a protein that has leucine at amino acid position 

6 65, and wherein said protein has a cellular 

7 fluorescence that is at least five times greater 

B than the cellular fluorescence of wild type Aequorea 

9 victoria green fluorescent protein; 

10 b. a protein that has leucine at amino acid position 65 

11 and threonine at position 168, and wherein said 

12 protein has a cellular fluorescence that is at least 

13 five times greater than wild type Aequorea victoria 

14 green fluorescent protein; 

15 c. a protein that has leucine at amino acid position 65 

16 threonine at position 168 , and cysteine at position 

17 66, wherein said protein has a cellular fluorescence 

18 that is at least five times greater than the 

19 cellular fluorescence of wild type Aequorea victoria 

20 green fluorescent protein ; 

21 d. A blue fluorescent protein that has histidine at 

22 amine acid position 67, leucine at position 65 and 

23 has a cellular fluorescence that is at least five 

24 times greater than that of BFP (Tyr 67 -*His) ; 

25 e. a blue fluorescent protein that has histidine at 

26 amino acid position 67, alanine at amino acid 

27 position 164 and has a cellular fluorescence that is 

28 at least five times greater than that of 

29 BFP (Tyr 67 ->His) ; 

30 f. a blue fluorescent protein that has histidine at 

31 amino acid position 67, leucine at amino acid 

32 position 65, alanine at amino acid position 164 and 

33 has a cellular fluorescence that is at least five 

34 ^ times greater than that of BFP ( Tyr 67 ->Hi s ) . 
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: 2 . An isolated nucleic acid of claim 1, which 

2 encodes an engineered Aequorea victoria green fluorescent 

3 protein ( " GFP " ) having a cellular fluorescence - that is at 

4 least five times greater than that of wild type GFP , wherein 

5 the engineered GFP has a leucine at ammo acid position 65. 

1 3 . An isolated nucleic acid according to claim 2, 

2 wherein the nucleic acid further encodes a threonine at ammo 

3 acid position 168. 

1 4 . An isolated nucleic acid according to claim 3, 

2 wherein the nucleic acid further encodes a cysteine at amino 

3 acid position 66. 

1 5 . An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ( M B FP " ) that 

3 has histidine at amino acid position 67 and leucine at 

4 position 65, and has a cellular fluorescence that is at least 
five times greater than that of BFP (Tyr 67 -*His ) 



5 



4 



1 6 . An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at ammo acid position 67 and alanine at amino 
acid position 164, and has a cellular fluorescence that is at 

5 least five times greater than that of BFP ( Tyr 67 ^His ) 

1 7 . An isolated nucleic acid according to claim 6, 

2 wherein the nucleic acid further encodes leucine at ammo acid 

3 position 65 . 

1 8. A transformed cell that expresses a protein 

2 encoded by a nucleic acid of claim 1. 

1 9. A vector comprising a nucleic acid of claim 1. 

1 10. A transformed cell comprising a vector of 

2 claim 9 . 
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11. A transformed cell that expresses a protein 

2 encoded by the nucleic acid of claim 1 fused to a protein 

3 encoded by a second nucleic acid of interest. 

1 12. An isolated engineered Aequorea victoria green 

2 fluorescent protein ( "GFP" > wherein the engineered GFP 

3 comprises leucine at ammo acid position 65, said engineered 

4 GFP having a cellular fluorescence that is at least five times 

5 greater than wild type GFP. 

1 13. An isolated engineered Aequorea victoria green 

fluorescent protein ("GFP") according to claim 12, wherein the 



engineered GFP has threonine at amino acid position 168. 

14 . An isolated engineered Aequorea victoria green 
fluorescent protein {"GFP") according to claim 13, wherein the 
engineered GFP has cysteine at amino acid position 66. 

15. An isolated blue fluorescent protein ("BFP") 
that comprises histidine at amino acid position 67 and leucine 
at amino acid position 65 and has a cellular fluorescence that 
is at least five times greater than that of BFP (Tyr 67 -*His ) 

16. An isolated blue fluorescent protein ("BFP") 
that has a histidine at amino acid position 67 and an alanine 
at amino acid position 164, that has a cellular fluorescence 
that is at least five times greater than that of 

5 BFP(Tyr 67 ^His) . 

1 17. An isolated blue fluorescent protein ("BFP" ) 

2 according to claim 16, wherein the BFP further has leucine at 

3 amino acid position 65. 



1 

2 
3 

4 



p 
3 
4 



1 18. A method of detecting and optionally isolating 

2 an engineered cell that contains a selected nucleic acid which 

3 encodes a selected protein or nucleic acid, comprising: 

4 a) stably introducing into a host cell in a population of 

5 host cells a vector that contains a first nucleic acid which 
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6 encodes a polypeptide selected from the group consisting of 

7 SG11. SG12, SG25, SB42, SB4 9 , SB50 and a second nucleic acid 

8 which encodes a selected protein or nucleic aciti, and 

9 b) detecting cells in the population of host cells that 
express SG11, SG12 , SG25, SB42, SB49, or SB50, and 

c) optionally sorting cells that express SG11, SG12, 
12 SG25, SB42, SB49, or S350 with a f luorescence - act i vated cell 

sorter to isolate individual cells that express said 
14 fluorescent protein. 

19. A nucleic acid construct wherein a coding 

2 sequence selected from the group consisting of sequences that 

3 encode SG11, SG12 , SG25, SB4 2 , SB4 9 , and SB50 is operably 
linked to a regulatory sequence of a selected gene. 



I c 
1 1 

2 
13 



4 



1 20. A nucleic acid construct wherein a first coding 

2 sequence that encodes a selected polypeptide is fused using 

3 genetic engineering to a second coding sequence selected from 
the group consisting of sequences that encode SG11, SG12 , 
SG25, SB42, SB4 9 , and SB50, such that expression of the fused 
sequence yields a fluorescent hybrid protein in which the 
polypeptide encoded by the first coding sequence is fused to 
the polypeptide encoded by the second coding sequence. 



1 21. A method of detecting and characterizing 

2 regulatory and coding sequence elements that regulate 

3 subcellular expression and targeting of proteins, comprising: 

4 a) expressing in an engineered cell, in the presence and 

5 absence of selected culture conditions and components, a 

6 nucleic acid wherein a first nucleic acid selected from the 
group consisting of nucleic acids that encode SG11, SG12 , 
SG25, SB42, SB49, and SB50 is operably linked to a second 

9 nucleic acid derived from a selected gene; 

10 b) detecting the presence and subcellular localization of 

11 fluorescent signal . 



7 
8 
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